Course · YouTube Library

AI Engineer — Evals & Observability

AI Engineer6 lessons5h 14m

Feedback-driven prompt optimization

1Build a Prompt Learning Loop - SallyAnn DeLucia & Fuad Ali, Arizeintermediate52m

Measuring AI agent developer productivity

2How METR measures Long Tasks and Experienced Open Source Dev Productivity - Joel Becker, METRintermediate1h 16m

LLM evaluator calibration and optimization

3Judge the Judge: Building LLM Evaluators That Actually Work with GEPA — Mahmoud Mabrouk, Agenta AIintermediate41m

Benchmarking LLM real-world limitations

4What Do Models Still Suck At? - Peter Gostev, Arena.ai, BullshitBenchadvanced20m

Building effective LLM agent eval platforms

5Why building eval platforms is hard — Phil Hetzel, Braintrustintermediate26m

Observability for production AI systems

6Shipping complex AI applications — Braintrust & Trainlineintermediate1h 39m

Feedback / ReportSpotted an issue or have an improvement idea?