-13
LLMs Can Teach Themselves to Better Predict the Future
arxiv.orgWe present an outcome-driven fine-tuning framework that enhances the forecasting capabilities of large language models (LLMs) without relying on human-curated reasoning samples. Our method leverages model self-play to generate pairs of diverse reasoning trajectories and probabilistic forecasts for a set of diverse questions that resolve after the models' knowledge cutoff date. We then rank pairs of these reasoning traces by their distance to the actual outcomes before fine-tuning the model via Direct Preference Optimization (DPO). On a separate test set, our approach increases prediction accuracy of Phi-4 14B and DeepSeek-R1 14B by between 7--10\% over a base model and a DPO fine-tuned control model with randomized labels, bringing them on par with forecasting capabilities of much larger frontier models like GPT-4o.
The basic model of DeepSeek-R1 14B was already groundbreaking since it reached the level of GPT-1o. But this does much better by bring it to the level of GPT-4o
Authors are from :
1 - Lightning Rod Labs (USA)
…
https://www.lightningrod.ai/about
2 - (UK)
London School of Economics and Political Science
Machine learning is still developing very fast.
“We used 8, H100 GPUs, for training.”
Huge amounts of processing power are not required.