Media Summary: Speaker: Alexandre Lacoste, Sr. Staff Research Scientist at ServiceNow Lacoste talks about his team's process for This lecture discusses the critical shift from evaluating static LLMs to complex AI As AI workloads push the limits of modern infrastructure, testing and emulation have become essential to building ...

Benchmarking And Scaling Web Agents - Detailed Analysis & Overview

Speaker: Alexandre Lacoste, Sr. Staff Research Scientist at ServiceNow Lacoste talks about his team's process for This lecture discusses the critical shift from evaluating static LLMs to complex AI As AI workloads push the limits of modern infrastructure, testing and emulation have become essential to building ... This video demonstrates how to effectively autoscale your AI In this AI Research Roundup episode, Alex discusses the paper: 'WideSearch: In this AI Research Roundup episode, Alex discusses the paper: 'GUI-360: A Comprehensive Dataset and

Large‑language models are easy to grade; real AI In this AI Research Roundup episode, Alex discusses the paper: 'DV-World:

Photo Gallery

Benchmarking and Scaling Web Agents with LLMs and VLMs
Agent Evaluation & Benchmarks - Agentic AI MOOC 2025 Lecture 4 Summary
#284 BrowseComp: Benchmark for Browsing Agents
Building for Scale Benchmarking and Performance Tuning of AI Cluster Networks
The hard truth about AI agent benchmarks
Autoscaling your AI agent under load
WideSearch: New Benchmark for LLM Agents
GUI-360: Dataset & Benchmark for Desktop Agents
Benchmarking AI Sales Agents: How WorkDone’s “AgentChallenge” Hit 90 % Accuracy
DV-World: New Benchmark for Data Viz LLM Agents
How I Actually Used AI Agents to Build a Benchmark
WideSearch: Benchmarking Agentic Broad Info-Seeking
View Detailed Profile
Benchmarking and Scaling Web Agents with LLMs and VLMs

Benchmarking and Scaling Web Agents with LLMs and VLMs

Speaker: Alexandre Lacoste, Sr. Staff Research Scientist at ServiceNow Lacoste talks about his team's process for

Agent Evaluation & Benchmarks - Agentic AI MOOC 2025 Lecture 4 Summary

Agent Evaluation & Benchmarks - Agentic AI MOOC 2025 Lecture 4 Summary

This lecture discusses the critical shift from evaluating static LLMs to complex AI

#284 BrowseComp: Benchmark for Browsing Agents

#284 BrowseComp: Benchmark for Browsing Agents

BrowseComp is a simple yet challenging

Building for Scale Benchmarking and Performance Tuning of AI Cluster Networks

Building for Scale Benchmarking and Performance Tuning of AI Cluster Networks

As AI workloads push the limits of modern infrastructure, testing and emulation have become essential to building ...

The hard truth about AI agent benchmarks

The hard truth about AI agent benchmarks

Everyone wants to compare AI

Autoscaling your AI agent under load

Autoscaling your AI agent under load

This video demonstrates how to effectively autoscale your AI

WideSearch: New Benchmark for LLM Agents

WideSearch: New Benchmark for LLM Agents

In this AI Research Roundup episode, Alex discusses the paper: 'WideSearch:

GUI-360: Dataset & Benchmark for Desktop Agents

GUI-360: Dataset & Benchmark for Desktop Agents

In this AI Research Roundup episode, Alex discusses the paper: 'GUI-360: A Comprehensive Dataset and

Benchmarking AI Sales Agents: How WorkDone’s “AgentChallenge” Hit 90 % Accuracy

Benchmarking AI Sales Agents: How WorkDone’s “AgentChallenge” Hit 90 % Accuracy

Large‑language models are easy to grade; real AI

DV-World: New Benchmark for Data Viz LLM Agents

DV-World: New Benchmark for Data Viz LLM Agents

In this AI Research Roundup episode, Alex discusses the paper: 'DV-World:

How I Actually Used AI Agents to Build a Benchmark

How I Actually Used AI Agents to Build a Benchmark

My old AI planning

WideSearch: Benchmarking Agentic Broad Info-Seeking

WideSearch: Benchmarking Agentic Broad Info-Seeking

WideSearch:

The BrowserGym Ecosystem for Web Agent Research

The BrowserGym Ecosystem for Web Agent Research

The BrowserGym Ecosystem for