Understanding and improving language model architectures
ECE 595 Seminar Series
January 31, 2025
11:00 AM - 12:00 PM
Understanding and improving language model architectures
Speaker: Samet Oymak, University of Michigan
Abstract: Recent advances such as ChatGPT have revolutionized language modeling. These models are based on the transformer architecture which uses the self-attention mechanism as its central component. In this talk, I discuss recent results on optimization- and approximation-theoretic understanding of self-attention, as well as how theory can guide the design of better mechanisms. I will first discuss the optimization dynamics to demystify how attention "finds the needle in the haystack": we show that, under gradient-based training, the attention weights converge to an analytically predictable solution that acts as a separator of relevant and irrelevant context within the input. Secondly, we identify the shortcomings of the standard transformer architecture when adapting to variations in contextual sparsity. This leads us to introduce a simple but effective and theoretically-grounded method called "Gated Softmax Attention". We show that GSA has negligible computational overhead but uniformly improves the language modeling capabilities, including in the latest models such as Llama 3. I will end the talk by discussing the current state of research and future directions.
Speaker bio: Samet Oymak is an assistant professor of electrical engineering and computer science at the University of Michigan, Ann Arbor. His research focuses on optimization theory, statistical learning, decision making, and trustworthy and efficient AI/ML methods. Prior to his present position, Oymak was with the ECE department at the University of California, Riverside. He has spent time as a researcher in the finance and tech industry, and completed a postdoc at the University of California, Berkeley as a Simons Fellow. He obtained his PhD from the California Institute of Technology (Caltech) in 2015, and freceived a Charles Wilts Prize for the best departmental thesis. Oymak is the recipient of an NSF CAREER award, a Google Research Scholar award, an Adobe Data Science Research award, and an Amazon Research award.
Faculty host: Ahmet Enis Cetin
Date posted
Feb 17, 2025
Date updated
Feb 17, 2025