Skip to main content

Course · YouTube Library

Anthropic — Research Papers

Anthropic19 lessons11h 26m

Giải thích cơ chế và an toàn AI

  1. 1What is interpretability?advanced4m

AI personality and alignment methods

  1. 2What should an AI's personality be?advanced38m

Scaling LLM interpretability

  1. 3Scaling interpretabilityadvanced53m

AI policy, safety, interpretability, future

  1. 4AI, policy, and the weird sci-fi future with Anthropic’s Jack Clarkintermediate38m

AI usage analysis for safety & impact

  1. 5What do people use AI models for?intermediate47m

LLM alignment faking and safety implications

  1. 6Alignment faking in large language modelsadvanced1h 30m

Challenges in AI alignment and interpretability

  1. 7How difficult is AI alignment? | Anthropic Research Salonadvanced28m

AI safety and jailbreak prevention

  1. 8Defending against AI jailbreaksintermediate1h 15m

AI safety, alignment, and control

  1. 9Controlling powerful AIadvanced51m

AI interpretability and internal thought processes

  1. 10Tracing the thoughts of a large language modelintermediate3m

AI consciousness and ethics

  1. 11Could AI models be conscious?intermediate44m

AI ethics and societal impact

  1. 12The Societal Impacts of AIintermediate8m

AI emotional support use safety research

  1. 13Affective Use of AIbeginner12m

understanding LLM internal mechanisms

  1. 14Interpretability: Understanding how AI models thinkadvanced59m

AI cybercrime and future safety threats

  1. 15Threat Intelligence: How Anthropic stops AI cybercrimeintermediate37m

Reward hacking and AI alignment

  1. 16What is Al "reward hacking"—and why do we worry about it?advanced52m

AI ethics, identity, and welfare

  1. 17Anthropic’s philosopher answers your questionsintermediate36m

AI model alignment and safety

  1. 18What is sycophancy in AI models?intermediate6m

AI functional emotions & interpretability

  1. 19When AIs act emotionalintermediate5m
Feedback / ReportSpotted an issue or have an improvement idea?