Compute Share Will Only Matter More
If I had to fight for one thing right now, it would be securing a greater share of compute for safety teams at frontier labs. I think this is a very fruitful intervention across a large slice of plausible worlds, especially if the intelligence explosion is near.
The state of play: we threw a lot of man-hours at the alignment problem and realized it wasn't trivial. Assuming benevolent powerful AI is possible, and capabilities keep improving, our best bet seems to be "get a lot of useful alignment work out of early transformative AI."
This is intuitive if you believe in recursive self-improvement. If in ~15 years human discoveries account for less than 1% of all AI capability innovation, the same better be true for alignment science.
What does it take for an AI to do good alignment research? It needs to
- Be smart enough to do alignment science
- Apply this intelligence to the science (no reward hacking)
- Not scheme against us
People have already spent a lot of time thinking about (3), but what strikes me about 1 and 2 is the capabilities RSI team wants the exact same thing. Is this odd? I don't think so. There's a lot of instrumental convergence when you're trying to get AI to do useful work.
Let's assume control teams mitigate scheming, and the capabilities team is creating a super capable automated researcher (if they weren't, we could all go home). Now there are some more ways you could get rekt: alignment research could be categorically harder than capabilities, or these AIs are simply not being used for alignment research.
I'm sympathetic to the former, but recent trends have updated me against this being as much of a problem as the latter. The mix of work done by automated researchers, meanwhile, seems extremely salient. One way to ensure a good mix is pushing for safety-team ownership of compute, which, as the intelligence explosion progresses, becomes both labor in the form of inference and the resources needed to point that labor towards useful alignment research. What does the safety team not having enough compute look like?

Well, if they owned 100% of the lab's compute, we could all go home. Depending on what the curve actually looks like, this seems like an impactful lever for reducing risk: it's a straightforward request organizationally, preserves a shocking amount of optionality on alignment techniques, and automatically scales with the lab.