Aston Zhang is a research scientist on the Llama team at Meta Generative AI and a core contributor to Llama 3. Previously, he served as a scientist and manager at AWS AI Research. His accolades include the ICLR Outstanding Paper Award, the ACM Ubicomp Distinguished Paper Award, and an ACM SenSys Best Paper Award nomination. His textbook, “Dive into Deep Learning,” is adopted worldwide. He holds a Ph.D. in Computer Science from the University of Illinois Urbana-Champaign.
Current research: pre-training architectures & scaling, long context (Llama 4).
News
- [Hiring] Join our Llama team for a 2025 research internship! Just email me if you are interested.
- Llama 3.1 405B is now openly available.
- Meet Llama 3, our state-of-the-art open source large language model. Check out my developer podcast.
Books
- A. Zhang, Z. C. Lipton, M. Li, and A. J. Smola
Dive into Deep Learning
Cambridge University Press, 2023
- Adopted at 500 universities from 70 countries
- Featured in the AWS re:Invent keynote by Swami, Head of AWS AI, Database, and Analytics
- A. Zhang, M. Li, Z. C. Lipton, and A. J. Smola
动手学深度学习
人民邮电出版社, 2nd ed., 2023, 1st ed., 2019
- Best seller in China
- Best seller in China
Papers (All)
M. Zhong*, A. Zhang*, X. Wang, R. Hou, W. Xiong, C. Zhu, Z. Chen, L. Tan, C. Bi, M. Lewis, S. Popuri, S. Narang, M. Kambadur, D. Mahajan, S. Edunov, J. Han, and L. van der Maaten (*equal contribution)
Law of the Weakest Link: Cross Capabilities of Large Language Models
“Cross-capability performance is limited by the weakest underlying capability.” In arXiv, 2024
llm-cross-capabilities.orgLlama Team, AI@Meta (Core Contributor)
The Llama 3 Herd of Models
2024
J. Ji, A. Zhang, C. Zhu, S. Wang, M. Kambadur, S. Chang, and W. Xiong
Pruning Computations in Transformer Prefilling for Large Language Models
“Speed up Transfomer prefilling for generation via a learnable router.” In arXiv, 2024J. Kim, A. Goyal, A. Zhang, B. Xiong, R. Hou, M. Kambadur, D. Mahajan, H. Hajishirzi, and L. Tan
A Systematic Examination of Preference Learning through the Lens of Instruction-Following
“Understand preference learning with rejection sampling and Monte Carlo Tree Search.” In arXiv, 2024Y. Yu, Z. Chen, A. Zhang, L. Tan, C. Zhu, R. Y. Pang, Y. Qian, X. Wang, S. Gururangan, C. Zhang, M. Kambadur, D. Mahajan, and R. Hou
Self-critiquing Improves Reward Modeling for Large Language Models
“Predicting both critiques and the scalar reward improves reward modeling.” In arXiv, 2024Z. Zhang and A. Zhang
You Only Look at Screens: Multimodal Chain-of-Action Agents
“Perform a task on smart phones? Train an agent using screenshots.” In arXiv, 2023
Z. Zhang, A. Zhang, M. Li, H. Zhao, G. Karypis, and A. J. Smola
Multimodal Chain-of-Thought Reasoning in Language Models
In Transactions on Machine Learning Research, 2023
[Idea Inspiration by Homeschooling]S. Ren, A. Zhang, Y. Zhu, S. Zhang, S. Zheng, M. Li, A. J. Smola, X. Sun
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition
In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2023
Z. Zeng, C. Hawkins, M. Hong, A. Zhang, N. Pappas, V. Singh, and S. Zheng
Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens
In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2023J. Chen, A. Zhang, X. Shi, M. Li, A. J. Smola, and D. Yang
Parameter-Efficient Fine-Tuning Design Spaces
In Proceedings of the International Conference on Learning Representations (ICLR), 2023
Z. Zhang, A. Zhang, M. Li, and A. J. Smola
Automatic Chain of Thought Prompting in Large Language Models
In Proceedings of the International Conference on Learning Representations (ICLR), 2023
Z. Liu, Z. Tang, X. Shi, A. Zhang, M. Li, A. Shrivastava, and A. Wilson
Learning Multimodal Data Augmentation in Feature Space
In Proceedings of the International Conference on Learning Representations (ICLR), 2023T. Yang, Y. Zhu, Y. Xie, A. Zhang, C. Chen, and M. Li
AIM: Adapting Image Models for Efficient Video Understanding
In Proceedings of the International Conference on Learning Representations (ICLR), 2023C. Qin, A. Zhang, Z. Zhang, J. Chen, M. Yasunaga, and D. Yang
Is ChatGPT a General-Purpose Natural Language Processing Task Solver?
In Empirical Methods in Natural Language Processing (EMNLP), 2023J. Chen, A. Zhang, D. Yang, M. Li, and A. J. Smola
A Cheaper and Better Diffusion Language Model with Soft-Masked Noise
In Empirical Methods in Natural Language Processing (EMNLP), 2023
H. Wang, A. Zhang, Y. Zhu, S. Zheng, M. Li, A. J. Smola, and Z. Wang
Partial and Asymmetric Contrastive Learning for Out-of-Distribution Detection in Long-Tailed Recognition
In Proceedings of International Conference on Machine Learning (ICML, Long Presentation), 2022
H. Wang, A. Zhang, S. Zheng, X. Shi, M. Li, and Z. Wang
Removing Batch Normalization Boosts Adversarial Training
In Proceedings of International Conference on Machine Learning (ICML), 2022
A. Zhang, Y. Tay, S. Zhang, A. Chan, A. T. Luu, S. C. Hui, and J. Fu
Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters
In Proceedings of the International Conference on Learning Representations (ICLR, Outstanding Paper Award), 2021
Tutorials
with A. J. Smola
Attention in Deep Learning [Keynote] [PDF] [Video]
In The 36th International Conference on Machine Learning (ICML), 2019with H. Lin, X. Shi, L. Lausen, H. He, S. Zha, and A. J. Smola
Dive into Deep Learning for Natural Language Processing
In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019with H. Lin, L. Lausen, S. Zha, A. J. Smola, C. Wang, and M. Li
From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond [Website]
In The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2019with H. Zhang, T. He, Z. Zhang, Z. Zhang, H. Lin, and M. Li
Everything You Need to Know to Reproduce SOTA Deep Learning Models from Hands-on Tutorial
In International Conference on Computer Vision (ICCV), 2019
Services
- Area Chair
- Annual Meeting of the Association for Computational Linguistics (ACL)
- Conference on Empirical Methods in Natural Language Processing (EMNLP)
- International Conference on Computational Linguistics (COLING)
Follow @astonzhangAZ
Tweets by astonzhangAZ