Part of the TPC Seminar Series

Speaker: Rio Yokota , Professor
Date: Wednesday, March 20, 2024
Time: 9:00 A.M. to 10:15 A.M. (Central Time)
Location: Virtual
Abstract:
Training LLMs has become a hot topic across academia, industry, and government in Japan. Various efforts to collect and clean trillions of tokens of text, develop efficient tokenizers for Japanese, modify scalable frameworks for the latest models, find the configuration and models for better convergence and stability are being made in a somewhat disorganized fashion. This talk will try to pick up and summarize the key efforts in Japan to pre-train LLMs on the largest systems such as Fugaku and ABCI. This includes pre-training of a 175B model from scratch and also continually pre-training Llama2-70B on Japanese text.
Biography:
Rio Yokota is a Professor at the Global Scientific Information and Computing Center, Tokyo Institute of Technology. His research interests lie at the intersection of high performance computing, linear algebra, and machine learning. He is the developer numerous libraries for fast multipole methods (ExaFMM), hierarchical low-rank algorithms (Hatrix), and information matrices in deep learning (ASDL) that scale to the full system on the largest supercomputers today. He has been optimizing algorithms on GPUs since 2006, and was part of a team that received the Gordon Bell prize in 2009 using the first GPU supercomputer. He is involved in many efforts to train Japanese large language models. Rio is a member of ACM, IEEE, and SIAM.

