← 回總覽

“通往胜利之路” — LessWrong

📅 2026-03-29 14:23 Chris_Leong 人工智能 11 分鐘 12924 字 評分: 80
AI 安全 AI 对齐 战略规划 社区协调 有效利他主义
📌 一句话摘要 本文为 AI 安全社区提出了一种去中心化的战略,强调个人需要在“英雄式责任”与专注的渐进式贡献之间取得平衡,同时保持对个人局限性和战略一致性的诚实。 📝 详细摘要 本文探讨了 AI 安全社区的一种去中心化战略框架,认为集中式规划既不可行,也适得其反。作者主张采用一种模式,即个人要么承担解决整个问题的“英雄式责任”,要么贡献专注且高影响力的子方案。文章概述了该方法取得成功的六个关键要求,包括对自身能力的诚实评估、战略性问题选择、定期计划审查以及对盲点的透明沟通。这是对面对生存风险时社区协调的一次反思。 💡 主要观点 AI 安全的集中式规划既不可行,也可能有害。 问题的复杂性

Title: "Path to Victory" — LessWrong | BestBlogs.dev

URL Source: https://www.bestblogs.dev/article/1cd8bd6d

Published Time: 2026-03-29 06:23:54

Markdown Content: Sign in to use highlight and note-taking features for a better reading experience. Sign in now

_This article is based on reflections from co-leading the Sydney AI Safety Fellowship._

We have a Problem 😱—actually several problems; alright, evenmoreproblems.

Unfortunately, the edge cases make it hard to write down a shared problem statement, but The Universe Doesn't Have to Play Nice so we just need to get on with it regardless. Consensus on exact problem bounds would most likely be negative anyway as it'd be much easier to completely neglect at least one value or problem that we would regret not dealing with.

So let's just assume that we have our problem statement and that it's close enough to other people's problem statement that a community forms around it. This community consists of people with a variety of different skills, temperaments and degrees of commitment. This naturally leads to the question of what aspects different members of the community should be working on. Notably, we need to avoid unrealistic assumptions about there being a centralised authority to draw up a plan and divide tasks - it's not going to happen and even if centralisation were possible it would likely just make things worse[[1]](https://www.bestblogs.dev/article/1cd8bd6d#fn8zwmvzno2r5).

Instead, we need to tackle the problem in a more decentralised manner. But how can we avoid dropping a ball we can't afford to drop without centralised co-ordination?

Well, whatever we do, we need to take into account that this is variation in commitment and ability. Some people are willing and able to take Heroic Responsibility[[2]](https://www.bestblogs.dev/article/1cd8bd6d#fn2xi41y42xw2), using risky techniques like Shut up and do the impossible!—others are not. Indeed, I suspect that very few people or groups will be capable of taking heroic responsibility for the whole problem. I was persuaded of this by an X[[3]](https://www.bestblogs.dev/article/1cd8bd6d#fnx1yvfc9zyw) by Richard Ngo:

!Image 1: Screenshot 2026-03-29 at 5.00.16 pm.png

This resonates strongly with me, though I'd frame this in terms of _taking heroic responsibility for_ short timelines. I expect working on short timelines to be much less intense if you're only biting off a small chunk of the problem, so I predict a wider range of people could work on this than Richard predicts[[4]](https://www.bestblogs.dev/article/1cd8bd6d#fntr8qu06dve).

In any case, this is where we are: we have a small number of people who can take heroic responsibility for the whole problem and a much larger number of people who can't[[5]](https://www.bestblogs.dev/article/1cd8bd6d#fng3e3b6x06m5). The people who can't take heroic responsibility should primarily just focus, focus, focus and pick one thing they can do well. I have a lot of resonance with how EA thinks about prioritisation, but I differ in terms of thinking more in systems[[6]](https://www.bestblogs.dev/article/1cd8bd6d#fnoutktwnqws9) and less in terms of direct, measurable impact. To be specific, I tend to think more in terms of interventions as building blocks (in terms of gathering resources or discovering information) that others can attempt to build on top of[[7]](https://www.bestblogs.dev/article/1cd8bd6d#fn4y2vb5ijtp3)—be it incrementalists laying one more brick or heroes shaping it into a working plan[[8]](https://www.bestblogs.dev/article/1cd8bd6d#fnb83kyzy123s).

I feel that there's a broad understanding that many of EAs old mental models of how to think about impact don't really carry over from EAs global health beginnings to the AI safety context, but we never really developed proper replacements. I think it's important to understand that there are different kinds of domains. Universal tools or mental models would be ideal, but this is extremely hard, perhaps even impossible. Producing tools to solve the problem in front of you feels much more viable. What I've described above fills in some of the blanks, but it needs to be developed in more detail.

I think it's worth stepping back and asking what we'd need to happen for this plan to succeed:

* Firstly, we need more people to ask themselves whether they'd be willing to step up to take heroic responsibility. I suspect very few people have deeply grappled with this question. For starters, for most people, grappling with it would mean confronting the possibility that things could be quite dire. Why else would you choose to suffer that much? Confronting catastrophic—let alone existential—risk is hard. Sure, we talk about it all the time, but mostly just on an intellectual level. Contronting this emotionally is truly something else. * Second, we need people to be honest with themselves about whether they are truly capable of taking on that much responsibility. I unfortunately am not. At times I've told myself that I was, but I was just fooling myself. The vibes of heroic responsibility are immaculate, but vibes can be so dazzling that they prevent us from seeing reality. Countless people want to be a rockstar or a famous athlete or a billionaire, but only in the abstract. If they knew how much work and sacrifice was involved, they'd probably realise that they don't actually want that at all. Trying to take on too much responsibility will simply crush you and you may take others down alongside you. * Third, we need more training programs that hope for at least some proportion of their graduates to take up heroic responsibility. Maybe there should be some programs that _only_ focus on this, but there are substantial risks that this pushes people to pick up a boulder that is heavier than they can carry, so I'd honestly think very carefully before pursuing something like this. * Fourth, we need people to be strategic. It's very easy to fall into the trap of going "well, it's not much but it's something. Anyway, I just have limited capacity, I'm just building a block" when they probably had options that would have been higher impact without them being required to take on more than they'd be willing to bear. Deciding not to pursue heroic responsibility doesn't mean you should let yourself be lazy in a way that you'll regret later. John Wentworth has an excellent article on how to think about problem selection. He argues: "if you do not choose robustly generalizable subproblems and find robustly generalizable solutions to them, then most likely, your contribution will not be small; it will be completely worthless." * Fifth, we need people to review their plans every so often and to be honest with themselves about whether their plan still makes sense. Not constantly—that'll only distract you—but at sensible intervals. That said, the situation is extremely fluid, the more recently you pivoted, the less likely you should be to pivot again. I know about sunk-cost fallacy and all, but constantly pivoting is like being a first responder at best and a form of insecurity at worst. This is a trap I've fallen into. We also need people to occasionally be considering whether it would be in line with their values to take on more heroic responsibility. Similarly, we need people to be really honest with themselves about whether they're attempting to bite off more heroic responsibility than they can chew. * Sixth, we need people to be honest with how things are progressing and what they actually have covered. One way we lose is if there's no hero to plug a vital gap; but another way we lose is if we think a gap is covered by someone and it isn't. Unfortunately, effectively communicating your limitations is _extremely_ hard. It is hard to see your blindspots, let alone admit it. In fact, probably the only way this is going to happen is if you don't just admit blindspots, but _potential_ blindspots. For example, "there's a chance that our selection process is too credentialist, but we've chosen the balance that we have because credentialled people are easier to evaluate, bring more credibility to the field, help improve methodological rigour and tend to have shorter time to impact". That's the kind of openness that we probably need so that the heroes to not be misled in terms of where the gaps are.

There is no such thing as a perfect plan—all plans have flaws or limitations. If you aren't aware of the potential flaws of this strategy, then you should consider not updating based upon this proposal. I've mostly left finding these limitations as an exercise for the reader[[9]](https://www.bestblogs.dev/article/1cd8bd6d#fn42aoa64jhy4), but I'll leave a few breadcrumbs in the spoiler block below.

It's often hard to analyse the pros and cons of a plan in the abstract, so a good place to begin would be: what are the alternatives? A few possibilities: BlueDot's Defense-in-Depth, a plan crafted specifically for short-timelines, a plan more narrowly focused on a specific strategy (like a pause), a all-hands on deck plan.

Another direction: what would the Least Convenient Possible World for this plan look like?

Also, how could the world change such that this plan would become outdated?

!Image 2: ChatGPT Image Mar 29, 2026, 05_06_49 PM.png

I haven't read Hayek, but apparently these arguments are his wheelhouse. In his 2025 review, Alexander Berger referenced Nan Ransohoff's concept of General Managers as something they were keen to explore. I view the concept of a 'general manager' as essentially just meaning someone taking heroic responsibility with a lot of resources. I may very well be the first person in the world to call a Tweet an "X". Claude writes; "If your prediction is right (a wider range of people can work on short timelines if they're only biting off a small chunk), this has significant practical implications for community strategy. This could be a standalone claim worth developing, rather than a brief aside." — I'll keep that in mind for the future (insofar as there is one). This model is a bit too binary in that it is possible to take heroic responsibility for a sub-problem even if you can't take heroic responsbility for the whole problem. There is a discipline called systems thinking, but I haven't yet found the time to engage with it substantively, so I think about systems in a more ad hoc way. Obviously, you need to take into account the probability that someone actually builds upon your work. Jay Bailey describes the difference between bridges and walls—walls benefit from each additional block, but bridges only work if the whole structure is complete. I'm working on an AI strategy course and so this could be a good exercise for the participants.

查看原文 → 發佈: 2026-03-29 14:23:54 收錄: 2026-03-29 16:00:26

🤖 問 AI

針對這篇文章提問,AI 會根據文章內容回答。按 Ctrl+Enter 送出。