← Back

The Next 700,000 Programming Systems

May 2026

I have been thinking about the future of programming systems research. Predicting the future is an onerous task. One way to get an accurate prediction is to reason from first principles. Why do we do programming systems research? One can give several answers to this question, such as to benefit humanity, to train the next generation of researchers, for career advancement, or just because it's fun. All these answers are valid, of course, but I want to take a stricter, utilitarian view of the field for the purposes of our analysis.

I claim the point of programming systems research is to make software engineering (SWE) cheaper. Take Rust for example. Whether you consider it research or not, Rust is an extremely successful programming systems project. If Rust did not exist, many programs that are today written in Rust would have had to be written in C++ instead, meaning a lot of time and money would have had to be spent in fixing memory errors. Rust allows us to make an upfront one-time investment that continues to pay dividends every time it is used to write a program. Or so is the claim. There are costs associated with writing a program in Rust that are absent if writing it in C++. But on balance I think Rust has net positive utility.

We can formalize this intuition of a project having net positive utility. Define \(C(x | S)\) as the cost of completing SWE task \(x\) given access to a new programming system \(S\). Define \(\lVert S \rVert\) as the total cost associated with R&D of \(S\) and training people (or agents) to use it. I claim \(S\) is worthwhile if the following inequality holds, where \(X\) is the set of all SWE tasks that will be done in the future:

\[ \lVert S \rVert < \sum_{x \in X} C(x) - C(x | S)\] In words, the inequality says that the total cost of creating and adopting \(S\) must be less than the net reduction in cost that \(S\) will bring about over time. Halide is another good example of a project that satisfies this inequality. There are many SWE tasks \(x\) for which \(C(x) - C(x | \text{Halide})\) is a large positive number, because a lot of time and money is saved by ensuring that performance optimization cannot cause correctness bugs, and there is a lot of code in the world to which Halide is applicable.

Before we proceed, I must admit that this modeling ignores many confounding factors. For example, the creation of \(S\) might increase the size of \(X\) in the future as previously impossible tasks become possible, or previously economically infeasible tasks become feasible, but this is hard to know beforehand when you are deciding whether to spend resources on R&D of S. It is also difficult to calculate the cost of training people to use \(S\) - how many people do you train? While these are all valid questions, the only way I know to squeeze something useful out of this analysis is to make these simplifying assumptions, carry on, and remember to take the results at the end with a hefty helping of salt. It may not be much but it is honest work.

Armed with our modeling assumptions, we can try to get a handle on the future by asking the following more precise question: how to update the inequality when coding agents write all the code? Our premise is that programming systems research is meant to make SWE cheaper, and coding agents have already revolutionized SWE, though not as much as some people would have you believe. The inequality in its current form is too general to obtain interesting results. To proceed, I introduce an assumption about the future.

I believe humans and coding agents will specialize for different aspects of SWE. Agents will specialize for implementation - call it programming if you will - and humans will specialize for "making sure the code does what it says on the tin". Unless we hit the singularity, it is hard to imagine a future in which a human does not have to take ownership and responsibility for their agents' output.

What does this nebulous task of ensuring code behaves the way it is supposed to actually entail? It is hard to say precisely as our understanding is in constant flux due to model capability increases, but I suspect it is some combination of reading the code, designing the high-level architecture of a system and ensuring it is adhered to, various kinds of testing, interrogating the agent about what it did and why, choosing the right language and libraries and ecosystem for a project, etc.

Based on this assumption, let us decompose the cost term so we can talk about the two new quantities independently:

\[ C(x) = \text{Agent}(\text{impl of } x) + \text{Human}(\text{verif of } x)\]

where the first term gives the "agentic cost" required for the implementation associated with task \(x\), and the second term gives the "human cost" required to verify that the task has been truly completed. By agentic cost I mean something like agent-hours and by human cost something like man-hours. Note that by verify I do not mean formal verification only. The human term is meant to encompass the time and money it takes for the human to be convinced that \(x\) has been successfully completed.

Now we can do a case analysis over the future and see what this inequality tells about which programming systems remain worthwhile.

Case: Omnipotent Agents

To begin I want to steelman the case for AI, because I think it is not given enough attention in my circles (and too much attention in others). Say we get recursive improvement, superintelligent agents, and all that good stuff. There is no reason a human has to be involved in SWE at all. Claude can just run the economy directly, communicating with instances of itself. We can model this by making two stipulations:

\[\text{Agent}(\text{impl of } x | S) = \text{Agent}(\text{impl of } x)\] as the marginal utility of any programming system to such an agent will surely be zero, and:

\[\text{Human}(\text{verif of } x | S) = \text{Human}(\text{verif of } x) = 0\] which you can interpret as saying the cost of human verification is zero because the code is always correct and does what it is supposed to do because an omnipotent intelligence wrote it.

If we substitute these assumptions into the inequality, we get the following condition for a new programming system to be worthwhile:

\[\lVert S \rVert < 0\] This is not true for any system! So, as you might have guessed, in a world of omnipotent agents, there is no utility in humans doing programming systems research. Oh well. Good to know.

Case: Bounded Agents

Now let us consider a world in which agents are not all-powerful and can benefit from human guidance and supervision. This happens to be the world in which we live at the time of writing. Here, I case on the kind of system we are analyzing.

Helping humans

First, consider a new system \(S_1\) that, if researched and developed, will make it cheaper for humans to convince themselves that a given piece of code is correct. We can model this by saying:

\[ \text{Human}(\text{verif of } x | S_1) < \text{Human}(\text{verif of } x)\] and:

\[ \text{Agent}(\text{impl of } x | S_1) = \text{Agent}(\text{impl of } x)\] as \(S_1\) is not designed for helping agents.

If we plug this back, we get the following condition for \(S_1\) to be worthwhile:

\[ \lVert S_1 \rVert < \sum_{x \in X} \text{Human}(\text{verif of } x) - \text{Human}(\text{verif of } x | S_1)\] In words, this says the total cost of creating and adopting \(S_1\) must be less than the net reduction in cost \(S_1\) will bring about over time for humans verifying code.

Wait, where have we heard this before?

This is identical to the first inequality we derived, but with \(\text{Human}\) instead of \(C\)! It turns out that if agent capabilities remain bounded, the utility calculus of programming systems remains almost the same as in a world without agents - we just have to focus on systems that make it easier for humans to convince themselves code does what it says it does.

Another advantage of this framing of the problem is that brings into focus what makes a programming system good. Revolutionary programming systems often strip away unnecessary details and allow the programmer to focus on the important bits. For instance, tile-based programming languages are very popular these days for GPU programming as the program is often tile-based in essence but raw CUDA often obscures this fact. A system that strips away unnecessary details makes human verification cheaper as there are fewer things to verify.

There are plenty of other examples of \(S_1\). An expressive type system that eliminates a certain class of bugs can also satisfy the inequality. A verified compiler precludes scouring the assembly for miscompilations. Not only are there numerous opportunities to do impactful research, but the calculus is something we've been pretty much following already!

This is the central result of our exercise. It is not very deep, and you could complain that the result is somewhat baked in to the assumptions, but I nevertheless think something real is being identified here, and just anecdotally, I was pleasantly surprised when I discovered it.

Helping Agents

Conversely, we can consider a system \(S_2\) designed to make it cheaper for agents to write code. To model \(S_2\), we assume:

\[ \text{Agent}(\text{impl of } x | S_2) < \text{Agent}(\text{impl of } x)\] and:

\[ \text{Human}(\text{verif of } x | S_2) = \text{Human}(\text{verif of } x)\] for similar reasons as before. Plugging this into the inequality, we get:

\[ \lVert S_2 \rVert < \sum_{x \in X} \text{Agent}(\text{impl of } x) - \text{Agent}(\text{impl of } x | S_2)\] which is the same inequality as the previous subsection except with agentic cost instead of human cost.

There are plenty of examples of \(S_2\). A new harness that improves agent performance or decreases the likelihood of buggy code, a new system for context management, a type system designed for agents, even optimizing inference or optimizing GPUs etc. will all result in reduction of agentic cost.

Conclusion

What will the next 700,000 programming systems look like? Nobody can tell you exactly, but as long as we don't hit the singularity or something, there will remain plenty of opportunity to do good work by focusing on:

Interestingly, in either case, the calculus for picking a project in terms of potential utility is much the same as we have been implicitly following all this time, perhaps with some modifications to adjust for the specialized roles humans and agents will adopt with regards to SWE. Go forth and do good work!

Acknowledgements

Rohan Yadav gave helpful feedback on this essay.

Thoughts?