The plan

My theory at the moment is as follows:

Code is now cheap with agentic AI developing it
Context is the expensive part
The biggest bottleneck appears to be context windows
At 1 million tokens, only maybe 500k are usable before the model degenerates
The current solution is increasingly baroque and complex memory servers
Most AI code being open sourced is written in Typescript or Python
Typescript and Python are designed for ease of development, not for cramming meaning into as few tokens as possible
The type systems in Typescript and Python are not sufficient "contracts" for the model to reason from
Strong typing can create interfaces and contracts for the AI to reason about rather than the function implementations being so important
Effect systems reduce ambiguity even further by indicating the "world" boundaries the function touches
Infrastructure is somewhat uniquely placed to benefit from this, as the main functions are either going to transform config or some other data, and the side effect functions will be used for deployments
Functional programming has an advantage in agentic programming for all of the reasons above

Therefore I have settled on writing some code with Haskell. Haskell is a purely functional language with an effect system. It has the effect on tokens of both being incredibly terse and the signatures and types making up a contract for the agent to use. Haskell is a very confusing and often mathematics heavy language, so in the past it was only really used by people with academic backgrounds. With agentic AI, theoretically anybody can take advantage of these language features with minimal knowledge of the language.

Effect systems are where if a function wants to cause some side effect (access data from outside of the program, or do any IO) it has to declare that it will do this in the signature. The benefit of this is twofold, the AI model will be able to see what the function is going to do in terms of IO mutation just from the signature, and testing becomes much easier. "Pure functions" (functions which are just data transformation without any IO operations) are easy to test because you just have to provide some assertion and prove that the output is as expected. This means you can get 100% unit test coverage easily on all of these functions. To test the effectful functions, we can use nix flake checks to spin up a vm with an instance of the program and whatever IO it uses whether that is DB or writing to a file or making HTTP calls. With this system you can quite easily get full test coverage. Just tell the AI to write the tests before it writes any code.

Compression

Using the above principles, you can compress code that would otherwise be written in Python, and I have found a roughly 3x decrease in tokens used with this method. Meaning in the roughly 300-500k useful context window, my program can fit very easily. But we can go a step further by leaning into the function signatures and types a bit more. Consider the function below:

parseKindEntry :: Value -> Maybe (ResourceKindId, ResourceSchema)
parseKindEntry val = do
  obj   <- km val
  props <- Map.lookup "properties" obj >>= km
  _     <- Map.lookup "apiVersion" props
  _     <- Map.lookup "kind"       props
  _     <- Map.lookup "metadata"   props
  gvk   <- extractGVK obj
  let specFields   = fieldTypes (Map.lookup "spec"   props >>= km >>= (Map.lookup "properties" >=> km))
      statusFields = fieldTypes (Map.lookup "status" props >>= km >>= (Map.lookup "properties" >=> km))
  pure (gvk, ResourceSchema gvk specFields statusFields)

(>=>) :: Monad m => (a -> m b) -> (b -> m c) -> a -> m c
f >=> g = \x -> f x >>= g

This function takes some OpenAPI schema JSON, validates it looks like a k8s resource, extracts the GVK identity, extracts spec/status field structures and then constructs a types schema representation into Haskell types, so that we can query and manipulate the data with Haskell functions. As you can see, it is very terse. It is very symbol heavy and confusing. This is fine, as like I said above this is not intended for human consumption. This is for the agent to use.

Using the OpenAI tokeniser we can see that this function in total is 197 tokens. Now let's see what the equivalent python would be. I will not paste the python code as it's pretty long, but it's 1178 chars and 270 tokens. But, it does not give you:

Error propagation
Structural validation
Parsing flow
Partiality

Now, if we use fully typed pydantic python, we get something a little better but still not quite the same. It is 500 tokens and it doesn't have the full types and it doesn't yet have feature parity. But there is some typechecking. Again, the agent cannot infer the function implementation from this. It uses Any a lot though. It is somewhat functional but not exactly.

Then I asked Claude to use every python type feature it could to give me full feature parity with the haskell function. The code then is 1200 tokens. So the haskell in this test case is roughly 6 times less tokens than the python for the same semantic meaning. Meaning I have achieved 1:6 semantic meaning per token versus the python implementation.

But let's take it a step further. The function signature in python is as follows:

def parse_kind_entry(
    val: Any,
) -> Optional[Tuple[ResourceKindId, ResourceSchema]]:

The AI can infer nearly nothing from this, it cannot tell if the function is going to mutate global state, make IO calls, or really do anything. The Haskell signature is very powerful. If we give the agent a unit tests, all of the types, compiler hints and a linter, there are very few implementations of the function that will be valid or sensible. In python nearly nothing can be inferred. So let's give the agent the function signature. Just parseKindEntry :: Value -> Maybe (ResourceKindId, ResourceSchema). This is 15 tokens. The amount of inferrence for the amount of tokens is absurd. The agent will also need the types. Let's add the relevant types to the context. The tokens used here are just shy of 100. So we get more type safety and more useful types, for less tokens.

The main takeaway here is that if we use a language that is very strict about types and effects, the function implementations are not needed by the agent. They can be inferred, and even if there are a handful of options, the output is always going to be the same based on the signature and we can unit test against this.

Effects

The effect system also helps give the agent more meaning to go off of. As a very quick example here are 2 signatures, one has IO access and the other does not. This is maybe 5 tokens to show to the AI that the function touches IO. Rather than in python where the agent would have to read the full implementation to see this. This is all enforced by the compiler too so it has no way to work around it:

-- pure: unit tested with haskell test assertions
parseKindEntry :: Value -> Maybe (ResourceKindId, ResourceSchema)

-- effectful: unit tested with a nix flake check which spins up a VM
applyResource :: ResourceKindId -> ResourceSchema -> IO (Either ApplyError ())

The inspiration

There is a program called PostgREST, which takes your Postgres schema and automatically creates HTTP endpoints so that you have REST API access to all of your database. It is written in Haskell, and it is clear that this is the most effective language for the tool. The tool is mostly schema parsing and middleware, so the functions are largely pure functions that make up a data pipeline and then the functions with side effects are the ones that expose the HTTP endpoint. This is the perfect model for what I'm describing.

My program

I am writing some code with deepseek in an openclaw session at the moment that mimics PostgREST, but uses the live state of a k8s cluster as the data. This will ingest the code into Haskell types, then expose a HTTP endpoint that allows you to do every query that you can do on a SQL database. I don't believe that this code would make sense in any other language. I will open source it at some point, but I wanted to share some of my experience both with taking loosely defined data and creating a query engine, and also squeezing the most out of the AI using token compression that comes entirely for free in Haskell.

Final thoughts

A lot of this post and exploration was triggered by the Claude Code source code leak. Somebody accidentally included a js map file in the npm release and leaked the code. It is 500k lines of typescript. I don't know how it got that bad but I also don't know how the AI is even able to ingest enough useful code in it's context. It can't even nearly ingest enough so they must be building it in a modular way, but I wanted to see if it is possible to give the AI the full picture of a complex program within the first 10k tokens. I think I have demonstrated this to some extent, you can just give it signatures, types and unit tests when you use a language with a powerful type system.