Frontiers in Generative AI for HPC Science and Engineering: Foundations, Challenges, and Opportunities


Realizing the promise of large-scale foundation models for scientific discovery—enabling self-driving laboratories, novel hypothesis generation, and more—requires unprecedented computational scale and multidisciplinary data preparation. While only a few organizations can train trillion-parameter models from scratch, advances in training, fine-tuning, and open-source models have expanded accessibility. Simultaneously, breakthroughs in reasoning models and data quality are reducing training costs and improving the performance of smaller AI models. As AI advances in general tasks, the scientific community is refining methods to evaluate and enhance scientific reasoning, a critical challenge for trustworthy AI in science. This workshop, catalyzed by the Trillion Parameter Consortium (TPC), will highlight advances in scalable and efficient training, alignment, inference, rigorous evaluation of skill, safety, and reliability, domain-specific generative AI, and multi-modal data fusion. SC25’s iteration drew 26 submissions, with 12 selected by double-blind peer review for presenting at SC25.

This workshop will highlight and ideally identify new collaboration opportunities through the international Trillion Parameter Consortium (TPC). The workshop features papers on all aspects of generative AI for science and engineering with presentations from academics, national laboratories, HPC centers, industry, institutes, and leaders from funding agencies. The workshop will also introduce the structure and strategies of the TPC, with an overview of multiple collaborative initiatives, driven by quarterly hackathons and in which new collaborators can contribute and benefit from joining the consortium.

Workshop Goals, Relevance to the SC Community, and Expected Outcomes

The science community has been on a steep learning curve during the past several years as it works to harness generative AI. Initial research focused on models for individual disciplines (e.g., UniverseTBD for astronomy, MAIRA-1 for radiological image analysis) while also exploring models aspiring to multi-disciplinary use (e.g., AuroraGPT, Olma2). More recently, models from industry sources such as OpenAI, Mistral, Meta, Google DeepMind, Anthropic, AWS, and others have introduced powerful reasoning capabilities, with varying opportunities for collaboration with the science community. New model architectures, moving beyond traditional Large Language Models to approaches such as Mixture of Experts and agentic flows, have dramatically improved the general-purpose performance of AI models, while the science community continues to evaluate and pursue improvements in scientific reasoning capabilities. These rapid advancements represent not merely incremental improvements, but a paradigm shift that may transform scientific inquiry by fostering unprecedented synergy between specialized domain expertise and emerging AI capabilities. This potential has motivated substantial community-wide efforts, including multiple working groups within the Trillion Parameter Consortium (e.g., scientific data preparation, training and inference performance optimization, scientific skills and reasoning evaluation, and discipline-specific groups ranging from biology to material science). Increased partnerships with the AI industry have also emerged in 2025, such as the “1500 Scientist AI Jam Session” organized in February 2025 by nine U.S. Department of Energy laboratories.

High-Performance Computing continues to be at the center of these endeavors. The enormous cost of computation for model training, as well as for inference in large user communities, limits the number of organizations that can realistically build, train, and offer access to large-scale models. These trends all offer incredible opportunities to collaborate, both to accelerate progress toward optimal tools and methods and to enable groups to strategically collaborate to reduce duplication of effort, such as in data preparation of new scientific data sources. Moreover, many new challenges are coming to the fore with generative AI, including new ways of thinking about data sharing and attribution, about licensing artifacts, and about the importance of responsible development of safe, ethical AI models. An overarching need today in AI is openness – which includes open data, open source code for tools and workflows, and careful thought as to how, when, and whether to open the models themselves. Such openness is critical to progress in every area related to generative AI.

The enormity of these challenges, and of the resources needed for data preparation, pre-training new models, and responsibly preparing them for downstream applications has meant that progress is largely concentrated in industry, where there is limited, or in some cases, no visibility into the artifacts (models, data sets) or the processes used to create them. This underscores the need for collaboration in the open science community—central to the motivation behind creating the international Trillion Parameter Consortium (TPC). The workshop aspires to stimulate new thinking, attract scientists to some of the emerging challenges associated with generative AI, and potentially catalyze the formation of new topical collaborative working groups that can be supported by TPC. Building on this rapidly evolving landscape, the TPC workshop is poised to serve as a vital nexus for cross-disciplinary dialogue and innovation. We anticipate that by fostering transparent discussions and collaborative initiatives, participants will not only refine their understanding of the evolving challenges in generative AI but also forge actionable partnerships and form new working groups. These outcomes aim to bridge the gap between industry-led advancements and open scientific research, ultimately shaping standardized practices for data sharing, ethical model development, and resource efficient training.

This workshop aspires to stimulate new thinking, attract scientists to some of the emerging challenges associated with generative AI, and potentially catalyze the formation of new topical collaborative working groups that can be supported by TPC. Moreover, the workshop will introduce six new, international collaborative initiatives that are being developed and advanced through quarterly TPC workshops.

Review Process

Workshop reviewers followed SC’s double-blind review process and conflict-of-interest management systems (preventing review assignments to, or input from, those with individual or institutional conflicts). Of 26 submissions, 12 were accepted to give presentations at the workshop and for those papers to be included in the SC25 proceedings.


Workshop Agenda

Sunday, November 16, 2025
Room 241 in “America’s Center Convention Complex

9:00-10:00 Morning Session One (Moderators: Javier Aula-Blasco, BSC, Spain and Charlie Catlett, Argonne/UChicago, USA)

  • Workshop Welcome and Overview (Javier Aula-Blasco, Charlie Catlett)
  • Evaluation of Test-Time Compute Constraints on Safety and Skill Large Reasoning Models (Balaji, Chen, Thakur, Cappello, Madireddy)
  • Batch Tiling on Attention: Efficient Mixture of Experts Training on Wafer-Scale Processors (Soboleva, Goffinet, Zeng, Ragate, Albuz, Vassilieva)

10:00-10:30 Coffee Break

10:30-12:30 Morning Session Two (Moderators: Per Oster, CSC, Finland and Ian Foster Argonne/UChicago, USA)

  • Automated MCQA Benchmarking at Scale: Evaluating Reasoning Traces as Retrieval Sources for Domain Adaptation of Small Language Models (Gokdemir, Getty, Underwood, Madireddy, Cappello, Ramanathan, Foster, Stevens)
  • Agentic AI vs ML-based Autotuning: A Comparative Study for Loop Reordering Optimization (Rosas, Ibrahim)
  • GridMind: LLMs-Powered Agents for Power System Analysis and Operations (Jin, Kim, Kwon)
  • Frameworks for Large Language Model Serving in HPC Environments (Marwaha, Zhou, Day, Dabhokar, Kindratenko)

12:30-14:00 Lunch Break

1400:1500 Afternoon Session One (Moderator: Jens Domke, RIKEN, Japan)

  • Exploring Distributed Vector Databases Performance on HPC Platforms: A Study with Qdrant (Ockerman, Gueroudji, Oh, Underwood, Chia, Chard, Ross, Venkataraman)
  • EQSIM Agent: A Conversational AI for Interactive Exploration of Large-scale Earthquake Simulation Data (Tang, McCallen)
  • Beyond End-to-End: Understanding the Limits of LLMs in Scientific Problem Solving (Liu, Di, Getty, Mallick, Underwood, Jin)

15:00-15:30 Coffee Break

15:30-17:30 Afternoon Session Two (Moderators: Valerie Taylor, Argonne/UChicago, USA and Rio Yokota, RIKEN, Japan)

  • BioR5: A Three-Layer Architecture for Biological Reasoning in Scientific AI (Ding, Brettin, Stevens)
  • ChatEED: An agentic retrieval assistant for accelerator operators (Reed, Bisegni, Shrestha, Huang, Ratner)
  • LABMATE: Language Model Based Multi-Agent System to Accelerate Catalysis Experiments (Acharya, Sharma, Parker, Vega, Ashraf, Isenberg, Strube, Rallo)
  • An Update on TPC-Coordinated Global Collaborative Projects (multiple TPC leaders)

Workshop Co-Organizers

  • Javier Aula-Blasco, Barcelona Supercomputing Center, Spain
  • Charlie Catlett Argonne National Laboratory and the University of Chicago, USA
  • Jens Domke, RIKEN Center for Computational Sciences, Japan
  • Ian Foster, Argonne National Laboratory and the University of Chicago, USA
  • Kyoungsook Kim, AIST, Japan
  • Per Öster, CSC IT Center for Science, Finland
  • Satoshi Matsuoka, RIKEN Center for Computational Sciences, Japan
  • Laura Morselli, CINECA, Italy
  • Rick Stevens, Argonne National Laboratory and the University of Chicago, USA
  • Makoto Taiji, RIKEN Center for Computational Sciences, Japan
  • Valerie Taylor, Argonne National Laboratory and the University of Chicago, USA
  • Miguel Vazquez, Barcelona Supercomputing Center, Spain
  • Mohamed Wahib, RIKEN Center for Computational Sciences, Japan
  • Rio Yokota, Institute for Science Tokyo, Japan

Proceedings and Dissemination

All accepted papers will be included in the SC’25 Workshops Proceedings. Presentations will made available as download links embedded in the above Agenda.


Workshop Call For Papers

This workshop sought papers to be included in the SC’25 proceedings, describing recent or ongoing work related to strategies for accelerating and improving the development of large-scale scientific AI models.  Topics included (but were not limited to):

  • Agentic workflows and architectures, particularly for implementation of scientific assistants or self-driven multi-component systems such as laboratories, manufacturing processes, or software development.
  • Scale-up infrastructure for large-scale inference services, ranging from at-scale experiments and broad communities of interactive users and applications to “AI Factories for Science.”
  • AI for scientific code development and optimization on novel hardware architectures.
  • Integration of AI into scientific tools for scientific experiments and software development, e.g., Python Notebooks, scientific visualization tools, or more general scientific environments such as Matlab and Mathematica.
  • Domain-specific vs. general-scientific foundation models and how domain-specific models might augment and extend advanced reasoning models.
  • Prototyping and evaluating reasoning models integrated with HPC, databases, scientific experiments, and domain-specific foundation models.
  • Extending reasoning capabilities in domains that are less axiomatic than mathematics or physics, such as in biology or economics.
  • Using LLMs and other AI model architectures to extract and build workflows and tools for entire scientific domains.
  • Implementing and evaluating new techniques such as federated learning or model distillation.
  • Integrating AI in HPC with AI-at-the-Edge and smaller and/or specialized platforms and models.
  • AI for design and optimization of scientific infrastructure, ranging from automated laboratories and factories to HPC infrastructure and workflows to intelligent sensor networks.Data Preparation and Management: Strategies for training data preparation, including deduplication, non-text information integration, corpus management, and multilingual data handling. Emphasis on developing efficient training pipelines and performance optimization.
  • Evaluation of AI Models: Techniques for evaluating the skills, safety, and trustworthiness of scientific AI models. Focus on benchmarks specific to scientific discovery and methods for assessing model performance in scientific contexts.
  • Model Architectures and Performance: Exploration of evolving architectures for large-scale scientific AI models, including transformers, mixture-of-experts, and state-space models. Discussion on optimal architectures for Exascale platforms and the best frameworks for trillion-parameter models.
  • Training Pipelines and Workflows: Innovative strategies for developing streamlined data curation pipelines and pre-training methodologies. Focus on incorporating domain-specific knowledge to enhance the performance and applicability of large language models (LLMs) in scientific research.
  • AI Model Skills and Trust Evaluation: Development of benchmarks for evaluating the skills, trustworthiness, and safety of large foundation models in scientific contexts. Emphasis on multilingual capabilities, uncertainty quantification, and robustness evaluation.
  • Scientific Applications of AI Models: Development and evaluation of foundation models for domains such as chemistry, biology, climate science, and high-energy physics. Focus on creating shared datasets and strategies for model sharing and collaboration across diverse scientific applications.
  • Expanding the Scale and Breadth of the AI Workforce, expanding the range of voices and problem solving approaches in the community, and with an emphasis on international collaboration to accelerate AI research and development.
  • AI Hardware Acceleration and Software Stack Strategies: Exploration of alternative hardware and software approaches to tackle the scaling challenges of AI workloads. Discussion on the computational demands and system architecture of GPU accelerators and potential solutions.
  • AI for Scientific Software Use Cases: Experiences and insights on using generative AI for scientific code generation, particularly in high-performance computing (HPC). Focus on improving training models and educating software developers to harness generative AI effectively.

The workshop reviews will prioritize papers that reflect the TPC objectives to accelerate progress through international and multi-institutional collaborations.

SUBMISSION GUIDELINES: Submissions must follow the guidelines of the SC’25 papers with respect to formatting, however, the main text of a submitted paper was limited to five content pages, including all figures and tables. References and optional technical appendices, with additional results, figures, and graphs, do not count as content pages. There is no page limit for the technical appendices. Reviewers are not required to review appendices. Reviews will not be double-blind for this workshop. All submissions will be reviewed by the program committee, with the following schedule (11:59pm anywhere on Earth on the specified date):

  • June 2, 2025: Call for Abstracts
  • June 16, 2025: Submission open (a link for submissions will be provided on this date)
  • September 5, 2025: Notification
  • September 22, 2025: Final Camera-ready papers due

To submit a paper, please visit https://submissions.supercomputing.org, log into your SC account (or create one), and select “submitter” as your role.  (link to submission form for this workshop)