Media Summary: In this talk we present how we trained a 530B parameter Episode 83 of the Stanford MLSys Seminar Series! Training The rapid advancement of AI has necessitated a fundamental shift in infrastructure, moving from homogenous workloads that fit ...

Ras Efficient Large Scale Language - Detailed Analysis & Overview

In this talk we present how we trained a 530B parameter Episode 83 of the Stanford MLSys Seminar Series! Training The rapid advancement of AI has necessitated a fundamental shift in infrastructure, moving from homogenous workloads that fit ... Sign up for AssemblyAI's speech API using my link ... At Ray Summit 2024, Anyscale's Yunxuan Xaio and Amjad Almahairi delve into advanced techniques for training In this AI Research Roundup episode, Alex discusses the paper: 'Active Learners as

Title: Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning Authors: Lianmin Zheng, Zhuohan Li, ... Linear transformers offer a subquadratic-time alternative to softmax attention, but face scaling issues. SUPRA proposes uptraining ... At Ray Summit 2025, Ying Sheng from SGLang and Qiaolin Yu from Anyscale share how SGLang has become a ... Authors: Deepak Narayanan (Stanford University), Fiodar Kazhamiaka (Stanford University), Firas Abuzaid (Stanford University), ... Tutorial on building intuition about LLMs. Slides: or ... Speaker: Andrea Pilzer, Ph.D. (NVIDIA AI Technology Center, Italy) Slides:

Photo Gallery

RAS: Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM - G. Perrotta
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM | Jared Casper
Training LLMs at Scale - Deepak Narayanan | Stanford MLSys #83
Gang Scheduling for Llama by Anca Agape and Andre Darabanov
Ultimate Guide To Scaling ML Models - Megatron-LM | ZeRO | DeepSpeed | Mixed Precision
Optimizing Large-Scale Model Training with Ray Compiled Graphs | Ray Summit 2024
Efficient LLM Reranking with Active Learning
RAS: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning - Gustavo Leite
Linearizing Large Language Models
SGLang: An Efficient Open-Source Framework for Large-Scale LLM Serving | Ray Summit 2025
SOSP 2021: Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP
Large Language Models in Five Formulas
View Detailed Profile
RAS: Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM - G. Perrotta

RAS: Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM - G. Perrotta

Title:

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM | Jared Casper

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM | Jared Casper

In this talk we present how we trained a 530B parameter

Training LLMs at Scale - Deepak Narayanan | Stanford MLSys #83

Training LLMs at Scale - Deepak Narayanan | Stanford MLSys #83

Episode 83 of the Stanford MLSys Seminar Series! Training

Gang Scheduling for Llama by Anca Agape and Andre Darabanov

Gang Scheduling for Llama by Anca Agape and Andre Darabanov

The rapid advancement of AI has necessitated a fundamental shift in infrastructure, moving from homogenous workloads that fit ...

Ultimate Guide To Scaling ML Models - Megatron-LM | ZeRO | DeepSpeed | Mixed Precision

Ultimate Guide To Scaling ML Models - Megatron-LM | ZeRO | DeepSpeed | Mixed Precision

Sign up for AssemblyAI's speech API using my link ...

Optimizing Large-Scale Model Training with Ray Compiled Graphs | Ray Summit 2024

Optimizing Large-Scale Model Training with Ray Compiled Graphs | Ray Summit 2024

At Ray Summit 2024, Anyscale's Yunxuan Xaio and Amjad Almahairi delve into advanced techniques for training

Efficient LLM Reranking with Active Learning

Efficient LLM Reranking with Active Learning

In this AI Research Roundup episode, Alex discusses the paper: 'Active Learners as

RAS: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning - Gustavo Leite

RAS: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning - Gustavo Leite

Title: Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning Authors: Lianmin Zheng, Zhuohan Li, ...

Linearizing Large Language Models

Linearizing Large Language Models

Linear transformers offer a subquadratic-time alternative to softmax attention, but face scaling issues. SUPRA proposes uptraining ...

SGLang: An Efficient Open-Source Framework for Large-Scale LLM Serving | Ray Summit 2025

SGLang: An Efficient Open-Source Framework for Large-Scale LLM Serving | Ray Summit 2025

At Ray Summit 2025, Ying Sheng from SGLang and Qiaolin Yu from Anyscale share how SGLang has become a ...

SOSP 2021: Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP

SOSP 2021: Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP

Authors: Deepak Narayanan (Stanford University), Fiodar Kazhamiaka (Stanford University), Firas Abuzaid (Stanford University), ...

Large Language Models in Five Formulas

Large Language Models in Five Formulas

Tutorial on building intuition about LLMs. Slides: https://link.excalidraw.com/p/readonly/aBWlNjEckdUlrszwwo6V or ...

Scaling and accelerating LLM trainings

Scaling and accelerating LLM trainings

Speaker: Andrea Pilzer, Ph.D. (NVIDIA AI Technology Center, Italy) Slides: https://hpc.fau.de/files/2026/01/20260116_FAU.pdf ...