Re-examining learning linear functions in context

Published:

Summary

We investigate how transformer models learn linear functions through in-context learning, challenging common assumptions about their algorithmic capabilities.

Contribution

This work provides new insights into the limitations of in-context learning by studying a controlled setup with synthetic data.

Abstract

We explore in-context learning (ICL), a popular paradigm for inference with Large Language Models (LLMs), in a controlled experimental setup using synthetic training data. Using a range of small transformer models trained from scratch, we focus on a mathematical task with simple yet precise prompts: learning a linear function f from a sequence of inputs and their corresponding function values . Our findings challenge the prevailing narrative that transformers adopt algorithmic approaches like linear regression to in-context learn (ICL) a linear function. We observe that all models have “boundary values” that limit generalizability. While we can extend boundary values with training distributions over a wider range, we lose the precision of models trained on distributions with more restricted ranges. Thus, we see a dilemma for ICL at least in some tasks: either models will lack generalizability or precision.

Click here to access the arxiv paper

Click here to access the Springer paper

@InProceedings{10.1007/978-3-032-02813-6_8, author=”Naim, Omar and Fouilh{'e}, Guilhem and Asher, Nicholas”, editor=”Braun, Tanya and Paa{\ss}en, Benjamin and Stolzenburg, Frieder”, title=”Re-examining Learning Linear Functions in Context”, booktitle=”KI 2025: Advances in Artificial Intelligence”, year=”2026”, publisher=”Springer Nature Switzerland”, address=”Cham”, pages=”104–117”, abstract=”We explore in-context learning (ICL), a popular paradigm for inference with Large Language Models (LLMs), in a controlled experimental setup using synthetic training data. Using a range of small transformer models trained from scratch, we focus on a mathematical task with simple yet precise prompts: learning a linear function f from a sequence of inputs {$}{$}x{_}i{$}{$}xiand their corresponding function values {$}{$}f(x{_}i){$}{$}f(xi). Our findings challenge the prevailing narrative that transformers adopt algorithmic approaches like linear regression to in-context learn (ICL) a linear function. We observe that all models have ``boundary values’’ that limit generalizability. While we can extend boundary values with training distributions over a wider range, we lose the precision of models trained on distributions with more restricted ranges. Thus, we see a dilemma for ICL at least in some tasks: either models will lack generalizability or precision.”, isbn=”978-3-032-02813-6” }