Ras Efficient Large Scale Language

Media Summary: In this talk we present how we trained a 530B parameter Episode 83 of the Stanford MLSys Seminar Series! Training The rapid advancement of AI has necessitated a fundamental shift in infrastructure, moving from homogenous workloads that fit ...

Ras Efficient Large Scale Language - Detailed Analysis & Overview

In this talk we present how we trained a 530B parameter Episode 83 of the Stanford MLSys Seminar Series! Training The rapid advancement of AI has necessitated a fundamental shift in infrastructure, moving from homogenous workloads that fit ... Sign up for AssemblyAI's speech API using my link ... At Ray Summit 2024, Anyscale's Yunxuan Xaio and Amjad Almahairi delve into advanced techniques for training In this AI Research Roundup episode, Alex discusses the paper: 'Active Learners as

Title: Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning Authors: Lianmin Zheng, Zhuohan Li, ... Linear transformers offer a subquadratic-time alternative to softmax attention, but face scaling issues. SUPRA proposes uptraining ... At Ray Summit 2025, Ying Sheng from SGLang and Qiaolin Yu from Anyscale share how SGLang has become a ... Authors: Deepak Narayanan (Stanford University), Fiodar Kazhamiaka (Stanford University), Firas Abuzaid (Stanford University), ... Tutorial on building intuition about LLMs. Slides: or ... Speaker: Andrea Pilzer, Ph.D. (NVIDIA AI Technology Center, Italy) Slides:

Photo Gallery

RAS: Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM - G. Perrotta

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM | Jared Casper

Training LLMs at Scale - Deepak Narayanan | Stanford MLSys #83

Gang Scheduling for Llama by Anca Agape and Andre Darabanov

Ultimate Guide To Scaling ML Models - Megatron-LM | ZeRO | DeepSpeed | Mixed Precision

Optimizing Large-Scale Model Training with Ray Compiled Graphs | Ray Summit 2024

Efficient LLM Reranking with Active Learning

RAS: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning - Gustavo Leite

Linearizing Large Language Models

SGLang: An Efficient Open-Source Framework for Large-Scale LLM Serving | Ray Summit 2025

SOSP 2021: Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP

Large Language Models in Five Formulas

View Detailed Profile

RAS: Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM - G. Perrotta

RAS: Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM - G. Perrotta

Title:

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM | Jared Casper

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM | Jared Casper

In this talk we present how we trained a 530B parameter

Training LLMs at Scale - Deepak Narayanan | Stanford MLSys #83

Training LLMs at Scale - Deepak Narayanan | Stanford MLSys #83

Episode 83 of the Stanford MLSys Seminar Series! Training

Gang Scheduling for Llama by Anca Agape and Andre Darabanov

Gang Scheduling for Llama by Anca Agape and Andre Darabanov

The rapid advancement of AI has necessitated a fundamental shift in infrastructure, moving from homogenous workloads that fit ...

Ultimate Guide To Scaling ML Models - Megatron-LM | ZeRO | DeepSpeed | Mixed Precision

Ultimate Guide To Scaling ML Models - Megatron-LM | ZeRO | DeepSpeed | Mixed Precision

Sign up for AssemblyAI's speech API using my link ...

Optimizing Large-Scale Model Training with Ray Compiled Graphs | Ray Summit 2024

Optimizing Large-Scale Model Training with Ray Compiled Graphs | Ray Summit 2024

At Ray Summit 2024, Anyscale's Yunxuan Xaio and Amjad Almahairi delve into advanced techniques for training

Efficient LLM Reranking with Active Learning

Efficient LLM Reranking with Active Learning

In this AI Research Roundup episode, Alex discusses the paper: 'Active Learners as

RAS: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning - Gustavo Leite

RAS: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning - Gustavo Leite

Title: Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning Authors: Lianmin Zheng, Zhuohan Li, ...

Linearizing Large Language Models

Linearizing Large Language Models

Linear transformers offer a subquadratic-time alternative to softmax attention, but face scaling issues. SUPRA proposes uptraining ...

SGLang: An Efficient Open-Source Framework for Large-Scale LLM Serving | Ray Summit 2025

SGLang: An Efficient Open-Source Framework for Large-Scale LLM Serving | Ray Summit 2025

At Ray Summit 2025, Ying Sheng from SGLang and Qiaolin Yu from Anyscale share how SGLang has become a ...

SOSP 2021: Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP

SOSP 2021: Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP

Authors: Deepak Narayanan (Stanford University), Fiodar Kazhamiaka (Stanford University), Firas Abuzaid (Stanford University), ...

Large Language Models in Five Formulas

Large Language Models in Five Formulas

Tutorial on building intuition about LLMs. Slides: https://link.excalidraw.com/p/readonly/aBWlNjEckdUlrszwwo6V or ...

Scaling and accelerating LLM trainings

Scaling and accelerating LLM trainings

Speaker: Andrea Pilzer, Ph.D. (NVIDIA AI Technology Center, Italy) Slides: https://hpc.fau.de/files/2026/01/20260116_FAU.pdf ...