← Back

A Collection of Unrelated AIS Things I’m Thinking About

I. Pivotal Processes

Let’s say you have resources and want to improve the world. You could reduce world hunger, build libraries, work to cure cancer, etc. But what’s the use of all of these things if no humans exist to enjoy them? Also, the future exists, which could be a really big deal conditional on minds existing to experience it.

Therefore: spend on world hunger, libraries, and cancer cures only once existential risks are low enough to justify any other priority. It’s not just regular altruism, it’s astronomical altruism.

One problem: if an action you take reduces existential risk only inside a small bound, say, a century, it’s not even all that great. That is, if you go and reduce risk in your time, but your grandchildren have to do it all over again, the expected value isn’t so astronomical after all.

From what I can tell, there are two good arguments against this.

  1. Humanity is entering a transitional “time of perils” where risk is especially high, and once we navigate our relationship with technology, background risk will go back down.
  2. Humanity can strike a permanent blow to existential risk through some “pivotal” act or process (aligned ASI, superintelligent whole brain emulation?).

These are kinda the same argument, because they both hinge on risk being variable over long time scales. Pivotal acts may be the only way to exit a time of perils.

If you're an astronomical altruist, how should you approach AI safety and the coming decades? Are people approaching it this way? Is an understanding of pivotal processes necessary to develop a good theory of victory?

II. Automated Coders

The AI Futures Project defines an Automated Coder (AC) as a system that fully automates an AGI project's coding work, and a Superhuman AI Researcher (SAR) as a system that automates “taste” work. They claim these milestones bookend a “stage” of the intelligence explosion. The team then posits these considerations:

  1. How much automating coding speeds up AI R&D. This depends on a few factors, for example how severely the project gets bottlenecked on experiment compute.
  2. How good AIs' research taste is at the time AC is created. If AIs are better at research taste relative to coding, Stage 2 goes more quickly.
  3. How quickly AIs' research taste improves. For each 10x of effective compute, how much more value does one get per experiment?

Based on what we’ve seen with MirrorCode versus adoption/uplift, it would seem “hard to verify” skills lag, while “easy to verify” skills are ahead of schedule. I see a world where coding agents get better and better at easy-to-verify tasks while reward-hacking on novel research. I haven’t been able to update strongly on research taste in months, however.

Thankfully, on June 4th, Anthropic published data relevant to recursive self-improvement (RSI):

Over 80% of all code merged into our codebase is now written by Claude

This is not a very interesting metric to me. Seems possible to hit 100% without getting much closer to RSI.

It's been months since many researchers at Anthropic hand-wrote code

Same as above.

The typical Anthropic engineer ships 8x as much code as they did in 2024

Uplift is confusing because AIs have weird capability profiles. Snooze.

On the most open-ended engineering tasks, Claude's success rate jumped from ~26% to 76% in 6 months

Very important and I want more resolution. Both on classification (how open-ended?) and success criteria (how successful?).

When research sessions went off-track, Claude proposed a better next step than the human took 64% of the time

Is this the “taste” everyone is talking about? If research can be split into “making decisions about what to do next” and “doing the next thing,” the former being a spectrum of taste-horizons and the latter being benchmarkable hard skills, what taste-horizon is Anthropic measuring? Something that requires strategic project planning? Searchable knowledge about libraries?

If you had the affordances, how would you design a taste study at a lab?

III. Antbux & Openbux

By betting that “AI could defeat us all combined,” the AI safety community put themselves in the very highest percentile of “optimism about the magnitude of AI progress,” right up there with the true believers.

Capitalism did the thing where it allocated capital to people who make accurate predictions about where capital may be most productive, so now we will have access to around an order of magnitude more funding, provided we scale up infrastructure to deploy it.

An errant note: a world where accelerationists race to build AGI would not have created this windfall. For this reason, I do not feel especially betrayed by safety-minded lab founders and employees. We are in the peculiar and serendipitous position of having our resources to mitigate risk roughly tied to the risk itself!

Importantly, if you have the skills and want AI to go well, it would make sense to spend the next few years being counterfactually responsible for deploying this money to the right places. How do you do this? From what I’ve heard, ranked from naive to more galaxy brained:

  1. Poorly scale our current grantmaking orgs, sacrifice taste, leverage AI labor as the intelligence explosion progresses
  2. Spend on things with lower marginal impact but high absorptive capacity, like politics, media, and prizes
  3. Buy talent off the capabilities/nonprofit-industrial-complex side, use it to scale research and grantmaking
  4. Y Combinator style incubation (X Combinator?) for impactful orgs, poaching talented tech scalers
  5. Advance Market Commitments, “impact equity” — general market mechanisms

I’m missing a lot, and plan to keep writing about this.