A curated list of scientific and interdisciplinary research on AI existential risks, especially in the era of large models.
☆15Jul 26, 2024Updated last year
Alternatives and similar repositories for awesome-ai-existential-risk
Users that are interested in awesome-ai-existential-risk are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is an agent (including contextual prompts) that queries your CSV☆10Jun 8, 2023Updated 3 years ago
- ReasoningShield: Safety Detection over Reasoning Traces of Large Reasoning Models☆26Sep 27, 2025Updated 9 months ago
- ☆34Oct 14, 2025Updated 8 months ago
- Official implementation for "ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation"☆21Jan 31, 2026Updated 5 months ago
- A from-scratch multi-difficulty-level tutorial on how pytorch, tensor flow, Jax, etc work☆13Jun 12, 2026Updated 2 weeks ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- 复旦白泽大模型安全基准测试集(2024年夏季版)☆51Jul 31, 2024Updated last year
- A package that achieves 95%+ transfer attack success rate against GPT-4☆26Oct 24, 2024Updated last year
- ☆16Dec 31, 2024Updated last year
- Implementing scalable LLMs in pure JAX (no third-party libraries)☆50Jun 11, 2026Updated 2 weeks ago
- Custom triton kernels for training Karpathy's nanoGPT.☆19Oct 21, 2024Updated last year
- Code to generate NeuralExecs (prompt injection for LLMs)☆27Oct 5, 2025Updated 8 months ago
- Accepted by IJCAI-24 Survey Track☆232Aug 25, 2024Updated last year
- Evaluating Safety of Autonomous Agents in Mobile Device Control (AAAI 2026 AI Alignment Track)☆33Jan 28, 2026Updated 5 months ago
- Camel-Coder: Collaborative task completion with multiple agents. Role-based prompts, intervention mechanism, and thoughtful suggestions☆35Jul 3, 2023Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Website for HKU NLP group (under construction)☆14Mar 20, 2026Updated 3 months ago
- Benchmarking Mobile Device Control Agents across Diverse Configurations (ICLR 2024 workshop GenAI4DM spotlight presentation; CoLLAs 2025)☆35Jul 21, 2025Updated 11 months ago
- A training framework for large-scale language models based on Megatron-Core, the COOM Training Framework is designed to efficiently handl…☆27Nov 14, 2025Updated 7 months ago
- [ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization☆29Jul 9, 2024Updated last year
- ☆190Oct 31, 2025Updated 8 months ago
- ECSO (Make MLLM safe without neither training nor any external models!) (https://arxiv.org/abs/2403.09572)☆36Nov 2, 2024Updated last year
- The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?☆43Nov 1, 2024Updated last year
- A JavaScript component for Eliptical curve cryptography for crypto currencies such as Litecoin and Bitcoin☆18Oct 20, 2024Updated last year
- ruvnet.☆217Jun 14, 2026Updated 2 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- TrustAgent: Towards Safe and Trustworthy LLM-based Agents☆59Feb 7, 2025Updated last year
- We want to build open-source solutions and standards for using AI to solve mental health challenges. The goal is to apply DPI knowledge a…☆27Jun 13, 2025Updated last year
- Apple Watch Fire TV remote☆13Oct 28, 2018Updated 7 years ago
- Inference-time alignment for harmlessness through cross-model guidance (ACL 2024). Code + MM-Harmful Bench.☆38Oct 2, 2024Updated last year
- Accepted by ECCV 2024☆212Oct 15, 2024Updated last year
- Track your metrics (motivation, happiness, relationships) and find correlations in your behaviors☆83Mar 8, 2015Updated 11 years ago
- An overview of GRPO & DeepSeek-R1 Training with Open Source GRPO Model Fine Tuning☆37May 18, 2025Updated last year
- Collection of browser userscripts 🐒☆18Jun 17, 2026Updated 2 weeks ago
- ☆39Apr 5, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- official implementation of [USENIX Sec'25] StruQ: Defending Against Prompt Injection with Structured Queries☆76Nov 10, 2025Updated 7 months ago
- An unofficial implementation of AutoDAN attack on LLMs (arXiv:2310.15140)☆45Feb 8, 2024Updated 2 years ago
- ☆49Feb 23, 2025Updated last year
- A PyTorch implementation of universal adversarial perturbation (UAP) which is more easy to understand and implement.☆53Mar 3, 2022Updated 4 years ago
- [ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal…☆87Jun 6, 2024Updated 2 years ago
- ☆53Jan 30, 2024Updated 2 years ago
- ☆82Dec 19, 2024Updated last year