Media Summary: From medical image translation that can fool doctors, to LLM agents that can be tricked into taking dangerous actions in real ... Presenter(s): Jose Leitao, Principal Network Engineer, Meta Guille Cobo, Senior Production Engineer, Meta Jose Leitao, Principal ... Today's episode takes us inside three very different frontiers of

This Ai Benchmark Changes Patching Forever Backportbench - Detailed Analysis & Overview

From medical image translation that can fool doctors, to LLM agents that can be tricked into taking dangerous actions in real ... Presenter(s): Jose Leitao, Principal Network Engineer, Meta Guille Cobo, Senior Production Engineer, Meta Jose Leitao, Principal ... Today's episode takes us inside three very different frontiers of

Photo Gallery

This AI Benchmark Changes Patching Forever (BackportBench)
Beyond Text: Benchmarking Real-World Failure Modes in AI Agents and Medical Synthesis
Every LLM Scored 0% on this AI Benchmark
How Benchmarks Are Ruining AI Quality
AI Is Crushing Patch Windows—and Security Teams Can’t Keep Up Manually
How I Actually Used AI Agents to Build a Benchmark
Can Generative AI Fix Bugs? Inside the Benchmarking Effort
High Performance at Scale and Reliability by Design: Operating META's largest AI clusters
Auto patching vulnerabilities with generative AI
From Benchmarks to Production: Evaluating, Diagnosing, and Scaling Agentic AI
AI Benchmarks Are Lying to You? I Tested 8 Models
View Detailed Profile
This AI Benchmark Changes Patching Forever (BackportBench)

This AI Benchmark Changes Patching Forever (BackportBench)

Ready to revolutionize automated

Beyond Text: Benchmarking Real-World Failure Modes in AI Agents and Medical Synthesis

Beyond Text: Benchmarking Real-World Failure Modes in AI Agents and Medical Synthesis

From medical image translation that can fool doctors, to LLM agents that can be tricked into taking dangerous actions in real ...

Every LLM Scored 0% on this AI Benchmark

Every LLM Scored 0% on this AI Benchmark

What is Program bench?

How Benchmarks Are Ruining AI Quality

How Benchmarks Are Ruining AI Quality

Benchmarks

AI Is Crushing Patch Windows—and Security Teams Can’t Keep Up Manually

AI Is Crushing Patch Windows—and Security Teams Can’t Keep Up Manually

AI

How I Actually Used AI Agents to Build a Benchmark

How I Actually Used AI Agents to Build a Benchmark

My old

Can Generative AI Fix Bugs? Inside the Benchmarking Effort

Can Generative AI Fix Bugs? Inside the Benchmarking Effort

This video explores whether generative

High Performance at Scale and Reliability by Design: Operating META's largest AI clusters

High Performance at Scale and Reliability by Design: Operating META's largest AI clusters

Presenter(s): Jose Leitao, Principal Network Engineer, Meta Guille Cobo, Senior Production Engineer, Meta Jose Leitao, Principal ...

Auto patching vulnerabilities with generative AI

Auto patching vulnerabilities with generative AI

Get the free 30-day

From Benchmarks to Production: Evaluating, Diagnosing, and Scaling Agentic AI

From Benchmarks to Production: Evaluating, Diagnosing, and Scaling Agentic AI

Today's episode takes us inside three very different frontiers of

AI Benchmarks Are Lying to You? I Tested 8 Models

AI Benchmarks Are Lying to You? I Tested 8 Models

Synthetic