AI Welfare Watch

Resources

Similar Resources

https://ailabwatch.org/

https://aisafetyclaims.org/

https://futureoflife.org/wp-content/uploads/2025/07/FLI-AI-Safety-Index-Report-Summer-2025.pdf

https://www.seoul-tracker.org/

https://futureoflife.org/document/fli-governance-scorecard-and-safety-standards-policy/

https://www.lcfi.ac.uk/news-events/news/ai-safety-policies

https://scale.com/leaderboard/mask

https://docs.google.com/document/d/1KknXf11a-DQuxvcephn6tJ_Bh07JPGZ__m6ss3HAuHg

Sources

[1] https://alignment.anthropic.com/2024/anthropic-fellows-program/

[2] https://threadreaderapp.com/thread/1885400181069537549.html

[3] https://deepmind.google/discover/blog/taking-a-responsible-path-to-agi/

[4] https://www.anthropic.com/news/announcing-our-updated-responsible-scaling-policy

[5] https://eleosai.org/post/eleos-commends-anthropic-model-welfare-efforts/

[6] https://github.com/paradigms-of-intelligence

[7] https://www.transformernews.ai/p/anthropic-ai-welfare-researcher

[8] https://www.dwarkesh.com/p/dario-amodei

[9] https://arxiv.org/abs/2311.08576

[10] https://futureoflife.org/wp-content/uploads/2024/12/AI-Safety-Index-2024-Full-Report-27-May-25.pdf

[11] https://x.ai/documents/2025.02.20-RMF-Draft.pdf

[12] https://www.theguardian.com/technology/2025/may/14/elon-musk-grok-white-genocide

[13] https://www.theguardian.com/technology/2025/may/18/musks-ai-bot-grok-blames-its-holocaust-scepticism-on-programming-error

[14] https://eleosai.org/post/experts-who-say-that-ai-welfare-is-a-serious-near-term-possibility/

[15] https://www-cdn.anthropic.com/dc4cb293c77da3ca5e3398bdeef75ee17b42b73f.pdf

[16] https://www.anthropic.com/news/strategic-warning-for-ai-risk-progress-and-insights-from-our-frontier-red-team

[17] https://www.lesswrong.com/posts/FG54euEAesRkSZuJN/ryan_greenblatt-s-shortform?commentId=duteJTXboAyQmvHfb

[18] https://www.cnbc.com/2024/05/17/openai-superalignment-sutskever-leike.html

[19] https://openai.com/index/updating-our-preparedness-framework/

[20] https://www.openaifiles.org/

[21] https://openai.com/charter/

[22] https://openai.com/index/our-approach-to-alignment-research/

[23] https://www.forbes.com/sites/larsdaniel/2025/02/01/deepseek-data-leak-exposes--1000000-sensitive-records/

[24] https://www.rfa.org/english/china/2025/04/24/china-deep-seek-south-korea-user-data/

[25] https://www.reuters.com/world/china/italy-regulator-opens-probe-into-chinas-deepseek-2025-06-16/

[26] https://www.theguardian.com/technology/2025/jan/28/we-tried-out-deepseek-it-works-well-until-we-asked-it-about-tiananmen-square-and-taiwan

[27] https://www.anthropic.com/news/paris-ai-summit

[28] https://www.anthropic.com/news/third-party-testing

[29] https://www.lesswrong.com/posts/xhKr5KtvdJRssMeJ3/anthropic-s-core-views-on-ai-safety?commentId=cguvrCn9NZehzdt4r

[30] https://www.anthropic.com/research/mapping-mind-language-model

[31] https://www.anthropic.com/research/tracing-thoughts-language-model

[32] https://www.anthropic.com/transparency

[33] https://www.anthropic.com/research/reasoning-models-dont-say-think

[34] https://arxiv.org/abs/2411.00986

[35] https://www.darioamodei.com/post/the-urgency-of-interpretability

[36] https://www.anthropic.com/research

[37] https://alignment.anthropic.com/

[38] https://www.anthropic.com/company

[39] https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf

[40] https://www.anthropic.com/news/the-long-term-benefit-trust

[41] https://www.techrxiv.org/users/799951/articles/1181157-political-bias-in-ai-language-models-a-comparative-analysis-of-chatgpt-4-perplexity-google-gemini-and-claude

[42] https://blogs.cfainstitute.org/investor/2025/05/14/ai-bias-by-design-what-the-claude-prompt-leak-reveals-for-investment-professionals/

[43] https://apnews.com/article/grok-4-elon-musk-xai-colossus-14d575fb490c2b679ed3111a1c83f857

[44] https://www.thetimes.com/business-money/technology/article/nations-must-work-together-to-harness-ai-king-tells-landmark-summit-hsrgjw0gq?region=global

[45] https://futureoflife.org/open-letter/pause-giant-ai-experiments/

[46] https://www.forbes.com/sites/ericmack/2015/01/15/elon-musk-puts-down-10-million-to-fight-skynet/

[47] https://arxiv.org/pdf/2201.11903

[48] https://cloud.google.com/use-cases/open-source-ai

[49] https://www.popularmechanics.com/science/a63633889/deepseek-open-weight/

[50] https://www.rfa.org/mandarin/zhengzhi/2025/01/28/wy-deepseek-china-chatgpt-security-challenge-communist/

[51] https://www.ft.com/content/f896c4d9-bab7-40a2-9e67-4058093ce250

[52] https://cdn.openai.com/global-affairs/ostp-rfi/ec680b75-d539-4653-b297-8bcf6e5f7686/openai-response-ostp-nsf-rfi-notice-request-for-information-on-the-development-of-an-artificial-intelligence-ai-action-plan.pdf

[53] https://x.com/ilyasut/status/1491554478243258368?lang=en

[54] https://experiencemachines.substack.com/p/ilya-sutskevers-test-for-ai-consciousness

[55] https://www.rosiecampbell.xyz/p/the-ai-bambi-effect

[56] https://x.com/jachiam0/status/1840996091665531295

[57] https://www.theverge.com/2023/3/15/23640180/openai-gpt-4-launch-closed-research-ilya-sutskever-interview

[58] https://openai.com/our-structure/

[59] https://openai.com/index/superalignment-fast-grants/

[60] https://openai.com/index/openai-scholars/

[61] https://academy.openai.com/public/videos/3-steps-to-ai-literacy-ai-ethics-policy-and-safety-2025-06-30

[62] https://openai.com/index/update-on-safety-and-security-practices/

[63] https://openai.com/index/language-models-can-explain-neurons-in-language-models/

[64] https://help.openai.com/en/articles/8313359-is-chatgpt-biased

[65] https://openai.com/index/frontier-risk-and-preparedness/

[66] https://www.lesswrong.com/posts/dqd54wpEfjKJsJBk6/xai-s-grok-4-has-no-meaningful-safety-guardrails

[67] https://techcrunch.com/2025/07/16/openai-and-anthropic-researchers-decry-reckless-safety-culture-at-elon-musks-xai/

[68] https://job-boards.greenhouse.io/xai/jobs/4777102007

[69] https://docs.x.ai/docs/overview

[70] https://www.nyventurehub.com/2024/12/02/from-algorithms-to-altruism-risks-and-rewards-of-xais-benefit-corporation-strategy/

[71] https://www.techradar.com/pro/security/doge-employee-with-sensitive-database-access-leaks-private-xai-api-key

[72] https://docs.x.ai/docs/guides/reasoning

[73] https://www.wired.com/story/grok-ai-privacy-opt-out/

[74] https://cyberscoop.com/grok4-security-flaws-prompts-splxai-research/

[75] https://www.nist.gov/news-events/news/2024/08/us-ai-safety-institute-signs-agreements-regarding-ai-safety-research

[76] https://www.computerweekly.com/news/366619238/Government-renames-AI-Safety-Institute-and-teams-up-with-Anthropic

[77] https://blogs.cisco.com/security/evaluating-security-risk-in-deepseek-and-other-frontier-reasoning-models

[78] https://cdn.prod.website-files.com/6690a78074d86ca0ad978007/679bc2e71b48e423c0ff7e60_1%20RedTeaming_DeepSeek_Jan29_2025%20(1).pdf

[79] https://arxiv.org/pdf/2501.12948

[80] https://arxiv.org/pdf/2412.19437

[81] https://arxiv.org/abs/2502.01225

[82] https://www.nytimes.com/2025/01/23/technology/deepseek-china-ai-chips.html

[83] https://arxiv.org/pdf/2504.01849

[84] https://www.cnn.com/2025/06/04/tech/google-deepmind-ceo-ai-risks-jobs

[85] https://arxiv.org/abs/2501.13011

[86] https://deepmind.google/discover/blog/gemma-scope-helping-the-safety-community-shed-light-on-the-inner-workings-of-language-models/

[87] https://www.youtube.com/playlist?list=PLw9kjlF6lD5UqaZvMTbhJB8sV-yuXu5eW

[88] https://ai.google/responsibility/principles/

[89] https://ai.google/static/documents/ai-responsibility-update-published-february-2025.pdf

[90] https://www.alignmentforum.org/posts/4uXCAJNuPKtKBsi28/negative-results-for-saes-on-downstream-tasks

[91] https://deepmind.google/discover/blog/evaluating-potential-cybersecurity-threats-of-advanced-ai/

[92] https://arxiv.org/abs/2503.11917

[93] https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/updating-the-frontier-safety-framework/Frontier%20Safety%20Framework%202.0%20(1).pdf

[94] https://modelcards.withgoogle.com/model-cards

[95] https://cloud.google.com/transform/2025-and-the-next-chapters-of-ai

[96] https://cloud.google.com/resources/content/future-of-ai-report

[97] https://news.stanford.edu/stories/2025/05/ai-models-llms-chatgpt-claude-gemini-partisan-bias-research-study

[98] https://www.nbcnews.com/tech/tech-news/google-making-changes-gemini-ai-portrayed-people-color-inaccurately-rcna140007

[99] https://www.codastory.com/newsletters/the-gaffes-and-biases-of-google-gemini/

[100] https://www.matsprogram.org/

[101] https://deepmindsafetyresearch.medium.com/agi-safety-and-alignment-at-google-deepmind-a-summary-of-recent-work-8e600aca582a

[102] https://www.anthropic.com/research/end-subset-conversations