Course · YouTube Library

Anthropic — Research Papers

Anthropic19 lessons11h 26m

Giải thích cơ chế và an toàn AI

1What is interpretability?advanced4m

AI personality and alignment methods

2What should an AI's personality be?advanced38m

Scaling LLM interpretability

3Scaling interpretabilityadvanced53m

AI policy, safety, interpretability, future

4AI, policy, and the weird sci-fi future with Anthropic’s Jack Clarkintermediate38m

AI usage analysis for safety & impact

5What do people use AI models for?intermediate47m

LLM alignment faking and safety implications

6Alignment faking in large language modelsadvanced1h 30m

Challenges in AI alignment and interpretability

7How difficult is AI alignment? | Anthropic Research Salonadvanced28m

AI safety and jailbreak prevention

8Defending against AI jailbreaksintermediate1h 15m

AI safety, alignment, and control

9Controlling powerful AIadvanced51m

AI interpretability and internal thought processes

10Tracing the thoughts of a large language modelintermediate3m

AI consciousness and ethics

11Could AI models be conscious?intermediate44m

AI ethics and societal impact

12The Societal Impacts of AIintermediate8m

AI emotional support use safety research

13Affective Use of AIbeginner12m

understanding LLM internal mechanisms

14Interpretability: Understanding how AI models thinkadvanced59m

AI cybercrime and future safety threats

15Threat Intelligence: How Anthropic stops AI cybercrimeintermediate37m

Reward hacking and AI alignment

16What is Al "reward hacking"—and why do we worry about it?advanced52m

AI ethics, identity, and welfare

17Anthropic’s philosopher answers your questionsintermediate36m

AI model alignment and safety

18What is sycophancy in AI models?intermediate6m

AI functional emotions & interpretability

19When AIs act emotionalintermediate5m

Feedback / ReportSpotted an issue or have an improvement idea?