Bỏ qua đến nội dung chính

Khoá học · Thư viện YouTube

Anthropic — Research Papers

Anthropic19 bài học11h 26m

Giải thích cơ chế và an toàn AI

  1. 1What is interpretability?Nâng cao4m

AI personality and alignment methods

  1. 2What should an AI's personality be?Nâng cao38m

Scaling LLM interpretability

  1. 3Scaling interpretabilityNâng cao53m

AI policy, safety, interpretability, future

  1. 4AI, policy, and the weird sci-fi future with Anthropic’s Jack ClarkTrung cấp38m

AI usage analysis for safety & impact

  1. 5What do people use AI models for?Trung cấp47m

LLM alignment faking and safety implications

  1. 6Alignment faking in large language modelsNâng cao1h 30m

Challenges in AI alignment and interpretability

  1. 7How difficult is AI alignment? | Anthropic Research SalonNâng cao28m

AI safety and jailbreak prevention

  1. 8Defending against AI jailbreaksTrung cấp1h 15m

AI safety, alignment, and control

  1. 9Controlling powerful AINâng cao51m

AI interpretability and internal thought processes

  1. 10Tracing the thoughts of a large language modelTrung cấp3m

AI consciousness and ethics

  1. 11Could AI models be conscious?Trung cấp44m

AI ethics and societal impact

  1. 12The Societal Impacts of AITrung cấp8m

AI emotional support use safety research

  1. 13Affective Use of AICơ bản12m

understanding LLM internal mechanisms

  1. 14Interpretability: Understanding how AI models thinkNâng cao59m

AI cybercrime and future safety threats

  1. 15Threat Intelligence: How Anthropic stops AI cybercrimeTrung cấp37m

Reward hacking and AI alignment

  1. 16What is Al "reward hacking"—and why do we worry about it?Nâng cao52m

AI ethics, identity, and welfare

  1. 17Anthropic’s philosopher answers your questionsTrung cấp36m

AI model alignment and safety

  1. 18What is sycophancy in AI models?Trung cấp6m

AI functional emotions & interpretability

  1. 19When AIs act emotionalTrung cấp5m
Góp ý / Báo lỗiPhát hiện sai sót hoặc có ý tưởng cải thiện?