Antonio strives to improve the efficiency of deep learning technologies by pioneering new architectures and training techniques grounded in theoretical knowledge.
Antonio Orvieto is an independent group leader at the Max Planck Institute for Intelligent Systems and a principal investigator at the ELLIS Institute Tübingen, Germany. He holds a Ph.D. from ETH Zürich and spent time at Google Deepmind, Meta, MILA, INRIA Paris, and HILTI. His main area of expertise is optimization for Deep Learning and design of neural networks for reasoning in complex sequential data. He has published in NeurIPS, ICML, ICLR, AISTATS, and CVPR; he organized the "Optimization for Data Science and Machine Learning'' session at the International Conference on Continuous Optimization (ICCOPT) in 2022 and the Workshop on Next Generation of Sequence Modeling Architectures at ICML 2024. He received the ETH medal for his outstanding doctoral thesis.
In his research, Antonio strives to improve deep learning technologies by pioneering new architectures and training techniques grounded in theoretical knowledge. His work encompasses two main areas: designing innovative models and optimizers capable of handling complex data, and understanding the intricacies of large-scale training dynamics. Central to his studies is exploring innovative techniques for decoding patterns in sequential data, with implications in natural language processing, biology, neuroscience, and music generation.
His LRU architecture is the basis for some of Google's Gemma language model variants.