Public Inflection Benchmarks
☆68Mar 6, 2024Updated last year
Alternatives and similar repositories for Inflection-Benchmarks
Users that are interested in Inflection-Benchmarks are comparing it to the libraries listed below
Sorting:
- ☆12Mar 11, 2022Updated 3 years ago
- triton ver of gqa flash attn, based on the tutorial☆12Aug 4, 2024Updated last year
- ☆13Jun 4, 2024Updated last year
- ☆10Feb 12, 2020Updated 6 years ago
- ☆18Jul 10, 2024Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆61Aug 30, 2024Updated last year
- ☆33Jul 31, 2024Updated last year
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆316Dec 20, 2023Updated 2 years ago
- ☆17Apr 7, 2025Updated 10 months ago
- ☆99Jul 25, 2023Updated 2 years ago
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆70Feb 22, 2024Updated 2 years ago
- Chinese processing☆36Jan 29, 2014Updated 12 years ago
- UQ: Assessing Language Models on Unsolved Questions☆30Aug 26, 2025Updated 6 months ago
- [ACL2024 Findings]DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling☆18Jun 6, 2024Updated last year
- TACL 2025: Investigating Adversarial Trigger Transfer in Large Language Models☆19Aug 17, 2025Updated 6 months ago
- ☆16Jul 23, 2024Updated last year
- ☆41Jun 19, 2024Updated last year
- An official implementation of Style-Talker for Spoken Dialogue Generation☆23Jan 12, 2025Updated last year
- Execution-Layer Security (ELS) for AI agents — policy-enforced shell with audit.☆41Updated this week
- Official implementation of TBA for async LLM post-training.☆29Nov 5, 2025Updated 4 months ago
- ☆868Dec 8, 2023Updated 2 years ago
- A port of the weiroll vm in huff☆27Jan 18, 2023Updated 3 years ago
- [ICLR 2025] DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆106Aug 17, 2025Updated 6 months ago
- [ICLR/AAAI 2026] Open-Source LLM-Based Data Analysis Agents☆67Jan 26, 2026Updated last month
- ☆87Jul 30, 2024Updated last year
- Language Understanding Augmentation Toolkit for Robustness Testing☆20Jan 22, 2023Updated 3 years ago
- NexAU (AU for Agent Universe), a general-purpose agent framework for building intelligent agents with tool capabilities.☆49Updated this week
- Scaling Data-Constrained Language Models☆342Jun 28, 2025Updated 8 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆273Apr 26, 2024Updated last year
- Momentum Decoding: Open-ended Text Generation as Graph Exploration☆19Jan 27, 2023Updated 3 years ago
- Harness for running and evaluating AI agents against RL environments☆120Updated this week
- 🐙 OctoPack: Instruction Tuning Code Large Language Models☆478Feb 5, 2025Updated last year
- Automatic Integration for Neural Spatio-Temporal Point Process models (AI-STPP) is a new paradigm for exact, efficient, non-parametric inf…☆25Oct 14, 2024Updated last year
- ☆19Mar 15, 2017Updated 8 years ago
- DialogueCSE: Dialogue-based Contrastive Learning of Sentence Embeddings☆19Nov 24, 2021Updated 4 years ago
- ☆34Feb 11, 2025Updated last year
- ☆53Jul 25, 2023Updated 2 years ago
- ☆28May 29, 2024Updated last year
- Codebase for Context-aware Meta-learned Loss Scaling (CaMeLS). https://arxiv.org/abs/2305.15076.☆25Jan 23, 2024Updated 2 years ago