RAG techniques continuously evolve to enhance LLM response accuracy by retrieving relevant external data during generation. To keep up with current AI trends, new RAG types incorporate deep step-by-step reasoning, tree search, citations, multimodality and other effective techniques.
3. Chain-of-Retrieval Augmented Generation (CoRAG) -> Chain-of-Retrieval Augmented Generation (2501.14342) Retrieves information step-by-step and adjusts it, also deciding how much compute power to use at test time. If needed it reformulates queries.
(Probably) the first "longCoT" dataset for the Russian language created via Deeseek-R1.
- Prompts taken from the Sky-T1 dataset and translated via Llama3.3-70B. - Answers and reasoning generated by Deepseek-R1 (685B). - 16.4K samples in total, ≈12.4K Russian-only (in the rest, either the answer or reasoning is in English). - Languages in the answers and reasoning are labeled using fasttext.