
Hackathon: Architectures, Evaluation, and Life Sciences AI Challenge Problems
- 5-7 March 2025
- Location: RIKEN R-CCS, Kobe, Japan
- The hackathon will begin at 9am on March 5 and conclude by 4:30pm on March 7.
- Hackathon Outcomes Report planning and review (optional) March 8. As many people will still be in Kobe due to travel schedules, there will be an optional breakfast to begin to integrate the hackathon outcomes report. This meeting will be held from 08:30-10:00am on Saturday March 8 in the Portopia hotel breakfast area.
Registration has closed – 65 participants are registered from Asia-Pacific (35), North America (23), and Europe (7).
Registration closed February 25 at 11:59pm (anywhere on earth).
Download maps, agenda, and room assignments.
The Trillion Parameter Consortium (TPC) convenes the community for the purpose of identifying and pursuing collaborations that will accelerate progress on responsibly and safely developing large-scale AI models for scientific discovery while expanding and diversifying the scientific AI community itself. In June 2024 at the European Kick-Off workshop in Barcelona, the groups shown below began to develop plans for collaborative projects. Further progress by TPC-facilitated collaborations and participating scientists was highlighted at the SC’24 TPC Workshop, attended by some 200 participants in November 2024.
This hackathon will convene multiple TPC working groups for hands-on work ranging from key decisions regarding responsibilities and approaches to running tests and benchmarks. Substantive outcomes are expected, as documented in the report from the Fall 2024 TPC Hackathon hosted by Argonne National Laboratory and The University of Chicago in October 2024.
Participation
Multiple working groups have been formed during the past 18 months of TPC boot-up, developing collaborative plans ranging from data sharing strategies to optimizing training pipelines or developing scientific skills evaluation frameworks. These self-assembled teams plan to meet, using this open call-for-participants to identify and invite participants who can help to accelerate -on progress. The meetings will be hands-on and in-person with limited opportunities for remote participation at the discretion of group co-organizers.
Program
The hackathon will begin with a full-day tutorial, followed by two days of breakouts with agendas outlined below. The tutorial is optional and is also open to others as a stand-alone event (but registration is still necessary).
Day 1 (March 5)
The tutorial will begin at 9:00 am in the Auditorium and will conclude at 5:00 pm. (See syllabus below)
Day 2 (March 6)
- 09:00-09:30 Welcome and Opening Orientation (Auditorium)
- 09:30-10:30 Hackathon Session 1
- 10:30-11:00 Coffee Break
- 11:00-12:00 Hackathon Session 2
- 12:00-13:00 Lunch and Group Photograph
- 13:00-15:00 Hackathon Session 3
- 15:00-15:30 Coffee Break and Fugaku Tour
- 15:30-17:00 Hackathon Session 4
- 18:00-20:00 Reception at Toriko in the Tokyu REI Hotel, Sannomiya
Day 3 (March 7)
- 09:00-10:30 Hackathon Session 5
- 10:30-11:00 Coffee Break
- 11:00-12:00 Hackathon Session 6
- 12:00-13:00 Lunch Break
- 13:00-15:00 Hackathon – Compile Outcomes and Next Steps
- 15:00-15:30 Coffee Break
- 15:30-16:30 Closing Plenary with Breakout Reports and Discussion (Auditorium)
Optional Evaluation Discussion on Saturday, March 8 from 08:30-10:00 in the Portopia hotel breakfast area.
Day 1 Tutorial on Leveraging and Evaluation of LLMs as Scientific Assistants
Objectives: Provide participants with the basics on the use of LLMs as scientific assistants and how these can be evaluated. This tutorial will be conducted by leaders from the EVAL team (Franck Cappello, Neil Getty, and Sandeep Madireddy from Argonne along with Javier Aula-Blasco from BSC)..
Syllabus
- 09:00-09:30 Welcome and Introduction
- 09:30-10:20 Deep Dive into LLM Anatomy
- 10:30-11:00 Break
- 11:00-12:00 LLM Use Cases for Science and Engineering
- 12:00-12:30 Prompting and Performance
- 12:30-13:30 Lunch
- 13:30-14:10 Basics of LLM Evaluation
- 14:10-14:30 LLMs as Judges
- 14:30-15:00 Intermediate Benchmarking
- 15:00-15:30 Break
- 15:30-15:50 Intermediate Reasoning Evaluation Techniques
- 15:50-16:55 Hands-on Exercises
- 16:55-17:00 Wrap up
- Adjourn
Days 2-3: Working Group Breakouts
Three breakouts will run simultaneously on days 2-3:
- Scientific Skills, Safety, and Trust Evaluation(EVAL)
- Model Architecture and Performance Evaluation (MAPE)
- Life Sciences Challenge Problems (BIO)
Agendas for each breakout are shown below.
Scientific Skills, Safety, and Trust Evaluation (EVAL)
Co-Leaders: Franck Cappello (Argonne), Javier Aula-Blasco (BSC), Neil Getty (Argonne), and Sandeep Madireddy (Argonne). This group focuses on developing methods to evaluate LLM skills and safety in scientific applications. The work centers around scientific benchmarks and the challenges of ensuring AI models’ trustworthiness, safety, and accuracy in real-world scientific applications.
- Day 2: Strategies for Scientific challenge problem generation and evaluation
Objectives: Get participants started using LLMs as scientific assistants for complex scientific tasks (challenges) and evaluating them for this use, particularly in a multilingual context. This will begin with a demo on the task. Participants will then form groups of 3 people working on one challenge in English and one in Japanese (or another language). Interactions will be human evaluated in accordance to the pre-defined rubrics. The 3 members of each group will evaluate their own separately, then reach an agreement. - Day 3: LLM-as-judge hackathon leveraging Day 2 interactions and focusing on improving the current judging prompts.
Objectives: Consider different ways of optimizing the use of generalist LLMs as judges of LLMs tested on multilingual, multi-turn scientific challenges, with a focus on manual vs. automatic optimization.
Model Architecture and Performance Evaluation (MAPE)
Co-Leaders: Rio Yokota (TiTech), Mohamed Wahib (RIKEN), and Murali Emani (Argonne). This group focused on evaluating and optimizing AI model architectures for large-scale scientific applications. The group divided into subgroups, each tasked with evaluating pretraining, inference, micro-benchmarks, and vision transformers. They conducted performance profiling on Exascale platforms to better understand model performance and identify potential bottlenecks.
The MAPE working group will focus on the following topics:
- Optimizing/benchmarking the performance of inference
- Insights and best practices from Deepseek.
- Efficient hardware and algorithms to handle KV cache workload
- What is the minimum model/dataset size that can replicate the reasoning capability of o1?
- How does quantization/distillation/compression/sparsification affect test time scaling?
- Dense vs. sparse (MoE) models
- Distributed-inferencing across multiple nodes and framework support (vLLM / TensorRT-LLM etc)
Life Sciences Challenge Problems (BIO)
Co-Leaders: Makoto Taiji (RIKEN), Miguel Vazquez (BSC), Arvind Ramanathan (Argonne), and Rick Stevens (Argonne). This group will explore the development, evaluation, and curation of challenge problems designed to evaluate scientific reasoning skills of AI models in the life sciences.
- Day 2: Foundations – defining challenges, leveraging biology web resources with AI, form teams on sub-topics. Challenge Problem Development – building and testing AI models, too use and prototype development
- Day 3: Continue Challenge Problem Development, wrap-up and planning next-steps
Organizing Committee
- Javier Aula-Blasco, Barcelona Supercomputing Center (Spain)
- Charlie Catlett Argonne National Laboratory and University of Chicago (USA)
- Murali Emani, Argonne National Laboratory (USA)
- Ian Foster, Argonne National Laboratory and University of Chicago (USA)
- Neil Getty, Argonne National Laboratory (USA)
- Neeraj Kumar, Pacific Northwest National Laboratory (USA)
- Sandeep Madireddy, Argonne National Laboratory (USA)
- Satoshi Matsuoka, RIKEN Center for Computational Sciences (Japan)
- Arvind Ramanathan, Argonne National Laboratory (USA)
- Rick Stevens, Argonne National Laboratory and University of Chicago (USA)
- Makoto Taiji, RIKEN (Japan)
- Valerie Taylor, Argonne National Laboratory and University of Chicago (USA)
- Robert Underwood, Argonne National Laboratory (USA)
- Miguel Vazquez, Barcelona Supercomputing Center (Spain)
- Mohamed Wahib, RIKEN (Japan)
- Rio Yokota, Institute of Science Tokyo (formerly Tokyo Tech) (Japan)
Logistics
Hotel
- Kobe Portopia Hotel (walking distance from RIKEN and public transit)
Address: 10-1, 6 Chome, Minatojima Nakamachi, Chuo-ku, Kobe, 650-0046 Japan
(directions)
Meeting Venue: RIKEN R-CCS Kobe Facility
- Address: 7-1-26 Minatojima-minami-machi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
(directions and travel tips to/from Kobe and R-CCS)
Registration
- There is no fee to participate in the hackathon, but registration is required.
Local Travel Tips
From KIX (Osaka Airport) to Portopia hotel, the quickest is to take the ferry boat from KIX to Kobe Airport then 3 stops on the Portliner. An alternative with more frequent departures is the limousine bus to Sannomiya then to take the Portliner.
If your hotel is around Sannomiya then the limousine bus would be the best bet to Sannomiya.
The ferry ticket counter is at the Northernmost (exit from customs then go right) end of the KIX terminal 1. You take a free shuttle bus 10-15 min prior to the boat departure right outside the counter. They accept most payments. At Kobe airport you can either walk 5-10 min to the Portliner station just by the terminal or take an extremely short bus ride.
The limousine bus departs from spot #6 right outside the KIX terminal 1. It is helpful if ahead of time you add the Suica transit card to your mobile phone for public transit payment. If you have Suica card physically or on your phone make sure to load at least ¥2200 credit (about $14 or 14€), then you can scan to go onboard; otherwise you can buy a ticket beforehand.

