The SCA 2026 Logo

TPC Conference Track at SCA/HPCA26

  • Dates: January 27 (11:30-17:00) and January 28 (11:00-17:00)
  • Venue: SCA/HPCA26 in Osaka, Japan
  • Room 1202, Osaka International Convention Center
  • Co-Chairs: Rio Yokota, Charlie Catlett

These tracks are part of an Invited Co-Located event at SCA/HPCA. All participants must register for the conference to participate.


The TPC is a global initiative that brings together scientists from government laboratories, academia, research institutes, and industry to tackle the immense challenges of building large-scale AI systems for scientific discovery. By focusing on the development of massive generative AI models, the consortium aims to advance trustworthy and reliable AI tools that address complex scientific and engineering problems. Target community includes (a) those working on AI methods development, NLP/multimodal approaches and architectures, full stack implementations, scalable libraries and frameworks, AI workflows, data aggregation, cleaning and organization, training runtimes, model evaluation, downstream adaptation, alignment, etc.; (b) those that design and build hardware and software systems; and (c) those that will ultimately use the resulting AI systems to attack a range of problems in science, engineering, medicine, and other domains.

More About TPC


Program

Tuesday, January 27, 2026

11:30-12:30 Session 1.1: TPC International Collaborative Initiatives (Moderator: Rio Yokota, Institute of Science Tokyo/RIKEN)

  • Workshop Welcome and Overview (Rio Yokota (IST), Charlie Catlett (Argonne))
  • Open Frontier Model: Rio Yokota (IST)
  • Evaluation: Franck Cappello (Argonne)
  • Driving Applications: Valerie Taylor (Argonne)

12:30-13:30 Lunch Break

13:30-15:00 Session 1.2: TPC Vision and Strategies (Moderator: Rio Yokota, Institute of Science Tokyo/RIKEN)

  • The Transformational AI Models Consortium (ModCon): Neeraj Kumar (PNNL)
  • EuroTPC: Fabrizio Gagliardi (BSC)
  • U.S. Department of Energy Genesis Mission: Rick Stevens (Argonne)
  • Collaborations in AI4S – Now and Future: Satoshi Matsuoka (RIKEN)

15:30-17:00 Session 1.3: Selected TPC Technical Working Group Updates (Moderator: Rio Yokota, Institute of Science Tokyo/RIKEN)

  • AI for Drug Discovery: Arvind Ramanathan (Argonne)
  • AI for Life Sciences and Healthcare: Makoto Taiji (RIKEN)
  • Toward Next-Generation Ecosystems for Scientific Computing: Workshop Summary: Anshu Dubey (Argonne)
  • Deduplication at Scale for Collaboration on Frontier Models: Robert Underwood (Argonne)

Wednesday, January 28, 2026

11:00-12:30 Session 2.1: AI for Science (Moderator: Neeraj Kumar, PNNL)

  • Evaluation of Geospatial Foundation Models: Kyoung-Sook Kim (AIST)
  • Vision Foundation Models for Weather and Climate Downscaling: Mohamed Wahib (RIKEN)
  • LUMI AI Factory – for Science and Business: Pekka Manninen (CSC)
  • When Language Models Learn from Themselves: Synthetic Data and Scientific Knowledge: Javier Aula-Blasco (BSC)

13:30-15:00 Session 2.2: Agentic AI (Moderator: Charlie Catlett, ANL/UChicago)

  • Agentic Large Language Model Copilots for Scientific Workflows: Anurag Acharya (PNNL)
  • Agentic AI vs ML-based Autotuning: A Comparative Study for Loop Reordering Optimization: Khaled Ibrahim (LBNL)
  • Breaking Barriers in Science: The AI and Hybrid Computing Evolution: Thierry Pellegrino (AWS)
  • System Requirements for Scalable Agentic AI: Ian Foster (ANL)

15:30-17:00 Session 2.3: Inference Services (Moderator: Anurag Acharya, PNNL)

  • Secure AI Infrastructure for Scientific Computing and Applications: Jens Domke (RIKEN)
  • VibeCodeHPC: Takahiro Katagiri (Nagoya University)
  • Mixture of Experts at Scale on Cerebras Hardware: Daria Soboleva (Cerebras)
  • The TPC Academic Scientific Inference Group: Perspectives from Around the World: Dan Stanzione (TACC)

Abstracts

Rio Yokota (IST): Developing an Open Frontier Model for Science

2026 will be an exciting year with national initiatives like the U.S. Genesis mission, Japan’s 1 trillion yen initiative,  and Europe’s AI-for-science strategy. The Trillion Parameter Consortium provides a platform for members of these national initiatives to communicate and share experiences. AI for science in 2026 is poised to evolve from passive assistants into active co-scientists. Open frontier models now offer accessible, reasoning-capable platforms that support tool use, chain-of-thought, and long-context analysis. This gives us hope that training an open frontier model is now becoming possible, but we must work together to accomplish this daunting task. This talk will report on the recent discussions at TPC for training an open frontier model for science.

Franck Cappello (ANL): Evaluating AI models and Agents as Scientific Assistants on real problems

LLMs and AI agents (Google Co-Scientist) have demonstrated exceptional knowledge extension and reasoning capabilities on multiple challenging tests (e.g., International Mathematics Olympiads) and continue to improve as shown by increasing performance on the multi-modal Humanity’s Last Exam and the ARC AGI2 benchmarks. This level of performance makes them potentially applicable to scientific research as research assistants. While scientific benchmarks of different levels of difficulty exist for LLMs, only the realistic end-to-end evaluation by scientists can provide a comprehensive assessment of their actual capability to assist and accelerate scientific research. To that end, Argonne and Riken conducted AI Jam sessions, gathering thousands of researchers exploring challenging research problems, and collecting tens of thousands of scientist prompts, AI model responses, and assessments of model responses. This talk will report on the findings of these AI Jam sessions.

Robert Underwood (ANL): Deduplication at Scale for Collaboration on Frontier Models

One critical step in the development of AI training sets is the deduplication of content from multiple sources.  It helps to prevent overfitting, modal collapse, and memorization of copyrighted content. However, running deduplication at scale requires careful attention to both performance and the configuration of the process to ensure good results.  In this talk, we present the LSHBloom, a scalable deduplication software that has been run at scale on ALCF systems.  We further describe how deduplication can be used as a powerful building block to both improve synthetically generated datasets and to facilitate federated collaborations on frontier model training by identifying the unique portions of data at each site while preserving the confidentiality of proprietary datasets.

Valerie Taylor (ANL): TPC Science Drivers:  SC25 Mini-Workshop Summary

During SC25, TPC organized an informal mini-workshop focused on domain science drivers to contribute to the TPC technical initiatives.   This workshop focused on for science domains — biology, HPC software, high energy physics, and materials science.  The workshop included some good discussions with respect to each science area.   This talk will provide a summary of the work discussed during this mini-workshop and extend an invitation for broader engagement of science drivers.

Makoto Taiji (RIKEN): AGIS Project: AI for Science project in RIKEN

In 2024, RIKEN launched the AGIS (Advanced General Intelligence for Science) program to promote AI for Science. Targeting the life sciences and materials/physical sciences, the initiative aims to scale up scientific research through the development of foundation models for scientific applications and through AI-driven research using AI agents. In the life science field, we are going to develop various foundation models based upon large-scale data acquisition via experimental automation, including genome language model for medical purposes, cellular response, animal behavior and so on. This presentation introduces the program’s overview and status.

Anshu Dubey (ANL): Toward Next-Generation Ecosystems for Scientific Computing: Workshop Summary

The computational science community is at a turning point. With the arrival of highly capable AI models, the traditional ways we compute, code, and collaborate are no longer adequate. Addressing the complexity of this moment requires more than incremental progress—it necessitates strategic risk-taking and foundational shifts in approach. We conducted a Workshop on Next-Generation Ecosystems for Scientific Computing from April 29 to May 1, 2025 that brought together more than 40 experts from high-performance computing (HPC), AI, computational science, software engineering, applied mathematics, and social sciences to chart a path toward more powerful, sustainable, and collaborative scientific software ecosystems. In this presentation I will provide a summary of the post-workshop report (https://doi.org/10.48550/arXiv.2510.03413).

Arvind Ramanathan (ANL): Agentic systems for biological systems design

Intrinsically disordered proteins (IDPs) represent challenging therapeutic targets due to their conformational heterogeneity yet play critical roles in numerous diseases. We present StructBioReasoner, a scalable multi-agent system for autonomous IDP-targeting biologics design that addresses this challenge through a tournament-based reasoning framework. Specialized agents focused on structural stability, evolutionary conservation, energetic optimization, and rational design principles compete to generate and refine engineering hypotheses, enabling parallel evaluation and natural distribution of computational work. Each agent integrates domain knowledge with computational tools including AlphaFold-based structure prediction, molecular dynamics simulation, and physics-based stability prediction, autonomously reasoning about tool selection and coordinating execution on HPC infrastructure. We demonstrate three key contributions: a multi-agent architecture specifically designed for IDP ensemble properties, a scalable tournament framework for efficient parallel hypothesis generation, and validation through case studies showing autonomous identification of stabilizing mutations matching literature-validated strategies. This work establishes a foundation for autonomous discovery of IDP-targeting therapeutics on emerging exascale platforms.

Kyoung-Sook Kim (AIST): Evaluation of Geospatial Foundation Models

Geosptial Foundation model (GFM) is not only the integration of large language models (LLM) with geographic information systems (GIS), but AI model pre-trained on massive geospatial data, capable of being generally used for or readily adapted to a wide range of tasks in geospatial domains (ex, flood mapping, multi-temporal crop segmentation, and land use/land cover classification, etc.). Even though we expect GFM to provide an innovative tool of geospatial intelligence to all devices in geo-referenced activities, it still faces a lot of challenges in its scalability, adaptability, robustness, and safety. This talk addresses the state-of-the-art of GFM and the challenge of integrating multi-scale, multi-sourced data. Finally, we discuss future directions for evaluating GFM and creating a benchmark framework to support model interpretability and fairness. 

Mohamed Wahib (RIKEN): ORBIT-2: Scaling Exascale Vision Foundation Models for Weather and Climate Downscaling

Sparse observations and coarse-resolution climate models limit effective regional decision-making, underscoring the need for robust downscaling. However, existing AI methods struggle with generalization across variables and geographies and are constrained by the quadratic complexity of Vision Transformer (ViT) self-attention. We introduce ORBIT-2, a scalable foundation model for global, hyper-resolution climate downscaling. ORBIT-2 incorporates two key innovations: (1) Residual Slim ViT (Reslim), a lightweight architecture with residual learning and Bayesian regularization for efficient, robust prediction; and (2) TILES, a tile-wise sequence scaling algorithm that reduces self-attention complexity from quadratic to linear, enabling long-sequence processing and massive parallelism. ORBIT-2 scales to 10 billion parameters across 65,536 GPUs, achieving up to 4.1 ExaFLOPS sustained throughput and 74–98% strong scaling efficiency. It supports downscaling to 0.9 km global resolution and processes sequences up to 4.2 billion tokens. On 7 km resolution benchmarks, ORBIT-2 achieves high accuracy with 𝐿2 scores in range of 0.98–0.99 against observation data.

Pekka Manninen (CSC): LUMI AI Factory – for Science and Business

As a part of the AI Innovation Package of the European Union, the EuroHPC Joint Undertaking is currently planning a set of AI-oriented supercomputers in Europe, deployed within the announced 19 “AI Factories”. In this talk, we will discuss the largest of these, LUMI AI Factory, materialized as a collaboration of 6 European countries. The included AI-optimized supercomputing infrastructure, called LUMI-AI, will be located in Kajaani, Finland and installed in 2027. LUMI-AI will be one of the most powerful and advanced quantum-accelerated supercomputing systems in the world at the time of its completion. We will cover the concept and the services of the LUMI AI Factory as well as the technical vision of the LUMI-AI infrastructure – focusing on how to build a converged infrastructure serving AI and HPC workloads

Javier Aula-Blasco (BSC): When Language Models Learn from Themselves: Synthetic Data and Scientific Knowledge

Synthetic data is playing an increasingly central role in the development of language-based AI systems, offering new possibilities for scale, efficiency, and coverage. At the same time, the data ecosystems on which these systems depend —particularly the public internet and scientific literature— are undergoing rapid transformation due to the widespread use of generative AI tools. This talk explores the current situation of synthetic data in language AI and science, examining how emerging practices in data generation, reuse, and consumption are reshaping both model training and knowledge production. Without focusing on specific implementations, the talk highlights structural tensions and open questions related to data quality, diversity, and long-term sustainability. Through selected examples from language technology and scientific workflows, the presentation invites reflection on how synthetic data is changing the relationship between human knowledge and AI, and why these changes deserve careful attention from researchers, practitioners, and policymakers.

Anurag Acharya (PNNL): Agentic Large Language Model Copilots for Scientific Workflows

Large Language Models (LLMs) are increasingly studied as scientific assistants, yet their practical use remains constrained by hallucinations, weak grounding in scientific data, and limited ability to interact reliably with computational tools and workflows. This talk presents an agentic, human-in-the-loop LLM copilot methodology that addresses these limitations through explicit task decomposition, retrieval-augmented reasoning over structured scientific corpora, and controlled tool-based code generation and execution. Rather than treating the LLM as a monolithic decision-maker, the approach uses a supervisor-driven, graph-based orchestration of specialized agents with typed tool interfaces, metadata-aware retrieval, and sandboxed execution environments, enabling separation of linguistic reasoning from scientific computation. The methodology is illustrated through two use cases: a catalysis copilot\ that integrates literature retrieval, simulation, data analysis, uncertainty quantification, and hypothesis generation, and a beamline science copilot that supports retrieval over publications and instrument documentation alongside on-demand analysis and visualization of experimental data. Together, these examples demonstrate how grounded, agentic LLM designs provide a principled NLP-centric framework for building reliable scientific copilots that naturally interface with high-performance and facility-scale computing environments.

Khaled Ibrahim (LBNL): Agentic AI vs ML-based Autotuning: A Comparative Study for Scientific HPC Code Refactoring

Modern High-Performance Computing (HPC) applications necessitate code optimizations for enhanced performance and energy efficiency on contemporary CPU and GPU architectures. While conventional Machine Learning (ML) autotuning methods have successfully navigated high-dimensional optimization spaces, they often incur significant evaluation costs and overlook program semantics. The emergence of Large Language Models (LLMs) and Agentic AI systems presents a promising avenue for addressing specific optimization challenges. This presentation specifically investigates the comparative and complementary roles of Agentic AI Systems versus Traditional ML Autotuning Techniques. To explore this, we conduct a comparative analysis between a traditional ML-based optimization approach and an Agentic AI system. This evaluation assesses their respective strengths and limitations when applied to the code refactoring of a real-time Dyson expansion code. Furthermore, we introduce LoopGen-AI, a new Agentic AI system powered by three distinct Large Language Models: GPT, Claude, and Gemini.

Thierry Pellegrino (AWS): Breaking Barriers in Science: The AI and Hybrid Computing Evolution

The convergence of AI and hybrid computing is revolutionizing scientific discovery by removing computational constraints that have limited research for decades. Cloud-based high-performance computing enabled Harvard Medical School to screen billions of molecules in hours and Moderna to develop a COVID-19 vaccine in just 11 months, while DTN’s AI-accelerated weather prediction system delivers life-saving forecasts in minutes instead of hours. Pioneering initiatives like the Pawsey Supercomputing Centre’s quantum-classical hybrid computing project are now tackling optimization and molecular modeling problems beyond the reach of classical supercomputers. This revolution democratizes access to world-class computational resources, ensuring that imagination—not infrastructure—becomes the only limit to scientific breakthrough.

Ian Foster (Argonne): System Requirements for Scalable Agentic AI

Large language models are increasingly embedded in agentic AI systems: persistent, stateful processes that plan, act, and adapt over many inference steps. This talk argues that scaling such systems exposes new infrastructure requirements beyond those addressed by model-centric approaches. Drawing on early deployments, I highlight key challenges in execution control, resource governance, and state management, and outline opportunities for the Trillion Parameter Consortium to treat agentic workloads as first-class HPC applications.

Jens Domke (RIKEN): RiVault: Recent and Future Developments of RIKEN’s Secure AI Infrastructure for Scientific Computing and a AI Jam recap

We are building RiVault, RIKEN’s approach to hosting a secure, open-source AI infrastructure tailored for our scientific and industrial use. We showcase how frontier open models, agentic AI, and RAG pipelines can be deployed locally, integrated tightly with HPC systems and simulation apps, while preserving privacy and performance. We will also share our current progress of setting up a mini-RiVault model serving stack for applications like the Spring-8 light source. Lastly, we highlight how RiVault was used during the ‘Japan Scientist AI Jam Session in December.

Takahiro Katagiri (Nagoya U.): Evaluating Agentic AI for HPC: Insights from the HPC-GENIE Project

Agentic AI has emerged as a promising paradigm for automating complex decision-making processes in High-Performance Computing (HPC). However, its effectiveness and impact remain insufficiently evaluated in terms of performance and productivity. This talk presents insights from the HPC-GENIE Project at Nagoya University, which investigates the design, deployment, and evaluation of agentic AI systems in practical HPC environments. HPC-GENIE explores how autonomous AI agents can support key HPC workflows, including performance tuning, execution orchestration, and the iterative optimization of HPC applications. Rather than treating agentic AI as a black-box assistant, the project emphasizes measurable outcomes, with a particular focus on performance efficiency and the automation of performance tuning. We have developed a prototype agentic AI system, named VibeCodeHPC, which is composed of multiple collaborative agents representing software development roles such as project managers, system engineers, and programmers. Through representative use cases on production HPC systems, this talk highlights both the opportunities and limitations of agentic AI in current HPC practice. The results provide concrete evidence of when and how agentic approaches deliver value, as well as situations in which carefully designed HPC workload scenarios are required for effective evaluation. We conclude by outlining open challenges and proposing directions toward standardized evaluation frameworks for agentic AI in future HPC systems.

Daria Soboleva (Cerebras): Mixture of Experts at Scale on Cerebras Hardware

Scaling Mixture of Experts (MoE) to trillion-parameter levels is historically difficult because all-to-all communication consumes most of the execution time on GPU clusters. In this talk, we show how Cerebras hardware eliminates this bottleneck. By using Weight Streaming and Batch Tiling on Attention (BTA), we enable highly sparse models to achieve the efficiency of dense architectures at scale.