Khoá học · Thư viện YouTube

Anthropic — Research Papers

Anthropic19 bài học11h 26m

Giải thích cơ chế và an toàn AI

1What is interpretability?Nâng cao4m

AI personality and alignment methods

2What should an AI's personality be?Nâng cao38m

Scaling LLM interpretability

3Scaling interpretabilityNâng cao53m

AI policy, safety, interpretability, future

4AI, policy, and the weird sci-fi future with Anthropic’s Jack ClarkTrung cấp38m

AI usage analysis for safety & impact

5What do people use AI models for?Trung cấp47m

LLM alignment faking and safety implications

6Alignment faking in large language modelsNâng cao1h 30m

Challenges in AI alignment and interpretability

7How difficult is AI alignment? | Anthropic Research SalonNâng cao28m

AI safety and jailbreak prevention

8Defending against AI jailbreaksTrung cấp1h 15m

AI safety, alignment, and control

9Controlling powerful AINâng cao51m

AI interpretability and internal thought processes

10Tracing the thoughts of a large language modelTrung cấp3m

AI consciousness and ethics

11Could AI models be conscious?Trung cấp44m

AI ethics and societal impact

12The Societal Impacts of AITrung cấp8m

AI emotional support use safety research

13Affective Use of AICơ bản12m

understanding LLM internal mechanisms

14Interpretability: Understanding how AI models thinkNâng cao59m

AI cybercrime and future safety threats

15Threat Intelligence: How Anthropic stops AI cybercrimeTrung cấp37m

Reward hacking and AI alignment

16What is Al "reward hacking"—and why do we worry about it?Nâng cao52m

AI ethics, identity, and welfare

17Anthropic’s philosopher answers your questionsTrung cấp36m

AI model alignment and safety

18What is sycophancy in AI models?Trung cấp6m

AI functional emotions & interpretability

19When AIs act emotionalTrung cấp5m

Góp ý / Báo lỗiPhát hiện sai sót hoặc có ý tưởng cải thiện?