AIR-Bench 2024 is a safety benchmark that aligns with emerging government regulations and company policies
☆28Aug 14, 2024Updated last year
Alternatives and similar repositories for air-bench-2024
Users that are interested in air-bench-2024 are comparing it to the libraries listed below
Sorting:
- code and data associated with CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations☆11Oct 13, 2023Updated 2 years ago
- ☆24Mar 1, 2025Updated last year
- ☆24Dec 2, 2023Updated 2 years ago
- Aioli: A unified optimization framework for language model data mixing☆32Jan 17, 2025Updated last year
- ☆32Jul 8, 2024Updated last year
- ☆37Oct 2, 2024Updated last year
- Self-evaluating RAG application on LangCheck docs☆11Sep 10, 2025Updated 6 months ago
- Demo repository showcasing how to use reusable workflows to build artifact attestations☆14Feb 16, 2026Updated 3 weeks ago
- ☆10Sep 29, 2023Updated 2 years ago
- ☆37Apr 26, 2021Updated 4 years ago
- ☆57May 21, 2025Updated 9 months ago
- We conduct a preregistered experiment to investigate whether fact checks provided by a large language model can serve as an effective mis…☆13Dec 14, 2024Updated last year
- gammcor code☆11Sep 25, 2025Updated 5 months ago
- A Benchmark for Evaluating Safety and Trustworthiness in Web Agents for Enterprise Scenarios☆19Updated this week
- ☆11Jan 25, 2021Updated 5 years ago
- IonQ iQuHACK 2024 Remote Challenge☆11Feb 3, 2024Updated 2 years ago
- [SIGIR 2025] Benchmarking Recommendation, Classification, and Tracing Based on Hugging Face Knowledge Graph☆16Jun 6, 2025Updated 9 months ago
- [EMNLP2023]: MIRACLE: Towards Personalized Dialogue Generation with Latent-Space Multiple Personal Attribute Control☆12Nov 11, 2023Updated 2 years ago
- A framework for few-shot evaluation of autoregressive language models.☆12Jul 14, 2025Updated 7 months ago
- PSI-MOD ontology for modified and unmodified amino acid residues☆14Jan 8, 2026Updated 2 months ago
- MV-RAG combines retrieval with multi-view generation to create accurate 3D-consistent visuals. By retrieving reference images and text, i…☆24Nov 29, 2025Updated 3 months ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆52Jul 15, 2025Updated 7 months ago
- Foundation Model for Probabilistic Electricity Price Forecasting☆19Sep 29, 2025Updated 5 months ago
- Simple getting started procedure for SciCat☆11Updated this week
- Paper dataset for "Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic Papers"☆12Oct 20, 2024Updated last year
- ☆21Updated this week
- A Python client library for accessing IQM quantum computers☆12Mar 26, 2025Updated 11 months ago
- Workshop that will take you from Graph Neural Networks (GNNs) to Transformers, architectures which have led to numerous breakthrough achi…☆13Sep 11, 2023Updated 2 years ago
- ⚖️ Code for the paper "Ethical Adversaries: Towards Mitigating Unfairness with Adversarial Machine Learning".☆11Dec 8, 2022Updated 3 years ago
- Official code of "The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets"☆23Sep 20, 2025Updated 5 months ago
- IBM iQuHACK 2024 In-Person Challenge☆13Feb 6, 2024Updated 2 years ago
- Implementation of the Pairformer model used in AlphaFold 3☆14Mar 2, 2026Updated last week
- NeRF - Neural Radiance Fileds in MATLAB☆10Jan 17, 2024Updated 2 years ago
- ☆15Updated this week
- ☆18Jul 3, 2025Updated 8 months ago
- A dataset plugin for climetlab for the dataset maelstrom-a1☆12Oct 25, 2023Updated 2 years ago
- ☆14Apr 29, 2024Updated last year
- [AAAI 2024] DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via Diffusion Models☆12Dec 5, 2024Updated last year
- ☆13May 10, 2025Updated 10 months ago