Re-examining learning linear functions in context
Published:

Summary
We investigate how transformer models learn linear functions through in-context learning, challenging common assumptions about their algorithmic capabilities.
Contribution
This work provides new insights into the limitations of in-context learning by studying a controlled setup with synthetic data.
Abstract
In-context learning (ICL) has emerged as a powerful paradigm for easily adapting Large Language Models (LLMs) to various tasks. However, our understanding of how ICL works remains limited. We explore a simple model of ICL in a controlled setup with synthetic training data to investigate ICL of univariate linear functions. We experiment with a range of GPT-2-like transformer models trained from scratch. Our findings challenge the prevailing narrative that transformers adopt algorithmic approaches like linear regression to learn a linear function in-context. These models fail to generalize beyond their training distribution, highlighting fundamental limitations in their capacity to infer abstract task structures. Our experiments lead us to propose a mathematically precise hypothesis of what the model might be learning.
Click here to access the paper
Recommended citation:
@misc{naim2024reexamininglearninglinearfunctions,
title={Re-examining learning linear functions in context},
author={Omar Naim and Guilhem Fouilhé and Nicholas Asher},
year={2024},
eprint={2411.11465},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2411.11465},
}