Evaluating and Optimizing LLMs For Exploration In-Context

Part of the TPC Seminar Series

Speaker: Allen Nie, PH. D Student at Stanford University
Date: Wednesday, June 4, 2025
Time: 11:00 a.m. (CT)
Location: Virtual

Learn More

Abstract:

Despite their success in many domains, large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty. This is crucial as many real-world applications, ranging from personalized recommendations to healthcare interventions, demand that LLMs not only predict but also actively learn to make optimal decisions through exploration. In this work, we measure LLMs’ (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications. We develop a comprehensive suite of environments, including both context-free and contextual bandits with varying task difficulties, to benchmark LLMs’ performance. Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs: by providing explicit algorithm-guided support during inference; and through algorithm distillation via in-context demonstrations and fine-tuning, using synthetic data generated from these algorithms. Impressively, these techniques allow us to achieve superior exploration performance with smaller models, surpassing larger models on various tasks. We conducted an extensive ablation study to shed light on various factors, such as task difficulty and data representation, that influence the efficiency of LLM exploration. Additionally, we conduct a rigorous analysis of the LLM’s exploration efficiency using the concept of regret, linking its ability to explore to the model size and underlying algorithm.

Biography:

Allen Nie is a Ph.D. student in computer science at Stanford University, advised by Professor Emma Brunskill and Chris Piech. His research area includes offline reinforcement learning, causal inference, and interactive decision making with language. He has applied algorithm work in CUDA kernel and math theorem generation. He interned at AWS, Microsoft Research, and Google DeepMind. His Ph.D. is supported by a Yee-Hoffman grant from the Stanford Human-Centered AI Institute (HAI). He has published papers in NeurIPS, ICML, ICLR, AAAI, ACL, and EMNLP. He serves as an Area Chair for the Reinforcement Learning Conference. His past work has been featured on Microsoft Research Blog, Stanford Artificial Intelligence Lab (SAIL) Blog, and ACM XRDS Magazine.

EXPLORE TPC Seminar Series Events