Re-examining learning linear functions in context

Published:

Summary

We investigate how transformer models learn linear functions through in-context learning, challenging common assumptions about their algorithmic capabilities.

Contribution

This work provides new insights into the limitations of in-context learning by studying a controlled setup with synthetic data.

Abstract

In-context learning (ICL) has emerged as a powerful paradigm for easily adapting Large Language Models (LLMs) to various tasks. However, our understanding of how ICL works remains limited. We explore a simple model of ICL in a controlled setup with synthetic training data to investigate ICL of univariate linear functions. We experiment with a range of GPT-2-like transformer models trained from scratch. Our findings challenge the prevailing narrative that transformers adopt algorithmic approaches like linear regression to learn a linear function in-context. These models fail to generalize beyond their training distribution, highlighting fundamental limitations in their capacity to infer abstract task structures. Our experiments lead us to propose a mathematically precise hypothesis of what the model might be learning.

Click here to access the paper

@misc{naim2024reexamininglearninglinearfunctions,
      title={Re-examining learning linear functions in context}, 
      author={Omar Naim and Guilhem Fouilhé and Nicholas Asher},
      year={2024},
      eprint={2411.11465},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2411.11465}, 
}