Continual learning may not be as difficult as it seems

Some optimistic predictions from the frontier labs

Dec 12, 2025

Update (12.20.25): New prediction from Sholto Douglas (Anthropic) that continual learning will be solved in 2026.

Over the past six months, it has become quite fashionable in certain circles to assume that AGI is more than a decade away because, among other things, current AI models lack continual learning. Ilya Sutskever is working on developing superintelligence with skills and knowledge of an eager 15-year-old and the ability to learn on the job; he thinks this is “5 to 20” years away. Andrej Karpathy thinks that current AI systems “are cognitively lacking and it’s just not working” because they have no continual learning; “[i]t will take about a decade to work through all of those issues”, he thinks.

But why should we assume that continual learning is a decade away? Could it be that it’s easier to achieve than some think?

Sholto Douglas: continual learning to “get solved in a satisfying way” in 2026

In the “No Priors” year-end podcast, Sholto Douglas (Anthropic) said that he thinks “that probably continual learning gets solved in a satisfying way” in 2026.

Dario Amodei: continual learning “not as difficult as it seems”

Dario Amodei, CEO of Anthropic, repeatedly stated this summer that continual learning may not be as difficult as it seems.

In a July interview, Amodei said that, even without continual learning, other techniques “can fill in many of the gaps”. One such technique is significantly lengthening the context window - perhaps to as much as 100 million words, which is roughly the number of words a human hears during his or her lifetime. There is “no reason” from a machine learning perspective why the context window could not be increased to this size, Amodei said, “it’s really just inference support” that is needed to make this viable.

But what about continual learning that allows for updating a model’s weights? Unexpectedly, Amodei suggested that Anthropic may have already found a path to achieving it:

One thing we learned in AI is whenever it feels like there’s some fundamental obstacle - like two years ago we thought there was this fundamental obstacle around reasoning - turned out just to be be RL, you just train with RL and you let the model write things down to try and figure out objective math problems…Without being too specific, we already have maybe some evidence to suggest that [continual learning] is another of those problems that is not as difficult as it seems that will fall to scale plus a slightly different way of thinking about things.

Amodei also hinted that an “inner loop/outer loop” structure, wherein an agent learns things, and optimizes for a lifetime of an episode (inner loop) and also learns over multiple episodes (outer loop) “maybe… is a way to learn continual learning”.

In a subsequent August interview, Amodei again mentioned extending a model’s context to 100 million tokens and suggested that models could be trained to be “specialized for learning over the context”; “[y]ou could, even during the context, update the model’s weights”. “[T]here are lots of ideas that are very close to the ideas we have now that could perhaps do this [i.e., achieve continual learning]”, Amodei said.

Shane Legg: “no fundamental blockers” on continual learning; “we have ideas” on how to develop it

In an interview released just today, Shane Legg, co-founder and Chief AGI Scientist at Google DeepMind, said that there are no “fundamental blockers” on continual learning (and also visual reasoning).

[W]e have ideas on how to develop systems that can do these things, and we see metrics improving over time in a bunch of these areas. So my expectation is over a number of years these things will all get addressed. But they’re not there yet.

Continual learning “might need some process whereby new information may be stored”, a “retrieval system or episodic memory”, and “systems whereby that information over time is trained back into some underlying model”, Legg said. This will require both more data and algorithmic and architectural changes.

prinz

Discussion about this post

Ready for more?