Is RAG necessary when you have incredible memory?

Context

Check out this thread:

(1/8) The Needle in the Haystack done by @GregKamradt was an amazing analysis of retrieval performance! Greg has graciously allowed us to build on his work with a repository that is now OSS.@natfriedman We have a much more rigorous test we’ve put out based on this idea.… pic.twitter.com/i5O8zrcwQT
— Aparna Dhinakaran (@aparnadhinak) December 15, 2023

This is a powerful analysis. Sure, Anthropic will find a way to improve or challenge the results. But the point is clear: these technologies can remember hyper specific 7-digit random numbers out of a batch of 126,000 tokens, where a token is roughly 4 characters. GPT is clear winner here, too.

Also, open source is getting incredibly good. This implies the future is open source.

Comparing @OpenAI #GPT4 Turbo to @MistralAI

GPT-4 is pretty good in that region in general. Interesting to see how @MistralAI scales to larger context windows pic.twitter.com/WQo6MmGIHh
— Aparna Dhinakaran (@aparnadhinak) December 15, 2023

Impact

RAG can be used to make retrieval more efficient. But if retrieval is already super efficient maybe RAG is only a short term thing. Context lengths of 10m tokens…probably by next year right?

Start of the year we were at 4K tokens. Now there are 126,000 tokens. 30x improvement. So to do another 30x improvement is 3.76M. So yea, by next year you should be able to just load the entire RAG database into memory. But…gonna be super expensive.

Point is: would GPT be this effective if it was using RAG over a database? Or is it more effective loading it all into context?

_________________________

Bryan lives somewhere at the intersection of faith, fatherhood, and futurism and writes about tech, books, Christianity, gratitude, and whatever’s on his mind. If you liked reading, perhaps you’ll also like subscribing: