☆49Apr 4, 2025Updated 11 months ago
Alternatives and similar repositories for super-benchmark
Users that are interested in super-benchmark are comparing it to the libraries listed below
Sorting:
- ☆25Nov 19, 2025Updated 3 months ago
- Code for "[COLM'25] RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing"☆23Mar 18, 2025Updated 11 months ago
- A Framework for Evaluating AI Agent Safety in Realistic Environments☆30Oct 2, 2025Updated 5 months ago
- ☆22Mar 23, 2025Updated 11 months ago
- ☆14Apr 14, 2025Updated 10 months ago
- HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation [ACL 2023]☆14Jul 11, 2023Updated 2 years ago
- To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models☆33May 21, 2025Updated 9 months ago
- ☆10Feb 6, 2025Updated last year
- NeurIPS 2024: SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation☆13May 24, 2025Updated 9 months ago
- Project for SNARE benchmark☆11Jun 5, 2024Updated last year
- AIRS-Bench: an AI Research Science benchmark for quantifying the end-to-end AI research abilities of LLM agents☆62Updated this week
- This is an implementation of the paper "Are We Done with Object-Centric Learning?"☆12Sep 11, 2025Updated 5 months ago
- The official github repo for MixEval-X, the first any-to-any, real-world benchmark.☆16Feb 15, 2025Updated last year
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models☆53Nov 26, 2024Updated last year
- TensorFlow code for our ECCV'24 Workshop paper "LightAvatar: Efficient Head Avatar as Dynamic NeLF"☆30Nov 7, 2024Updated last year
- [COLING 2025] Official repo of paper: "Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jail…☆12Jul 26, 2024Updated last year
- Efficient Pre-training of Masked Language Model via Concept-based Curriculum Masking☆13Feb 5, 2023Updated 3 years ago
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism☆30Jul 17, 2024Updated last year
- Implementation code for ACL2024:Advancing Parameter Efficiency in Fine-tuning via Representation Editing☆15Apr 20, 2024Updated last year
- XmodelLM☆38Nov 19, 2024Updated last year
- https://footprints.baulab.info☆17Oct 4, 2024Updated last year
- Control LLM☆22Apr 6, 2025Updated 10 months ago
- ☆33Jul 9, 2025Updated 7 months ago
- Official Repository for Paper "BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Compet…☆18Aug 28, 2024Updated last year
- ☆24May 13, 2025Updated 9 months ago
- A Comprehensive Dataset for Advanced Image Generation and Editing}☆31Oct 2, 2025Updated 5 months ago
- A django app allowing users to follow anything.☆19Feb 9, 2026Updated 3 weeks ago
- Codes for "NAST: A Non-Autoregressive Generator with Word Alignment for Unsupervised Text Style Transfer" (ACL 2021 findings)☆15Nov 3, 2021Updated 4 years ago
- Baselines for all tasks from Long Code Arena benchmarks 🏟️☆39Mar 30, 2025Updated 11 months ago
- [ACL'25 Oral] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆76Jun 25, 2025Updated 8 months ago
- WorldSense benchmark for grounded reasoning in language models☆24Nov 28, 2023Updated 2 years ago
- Official Implementation of UA^{2}-Agent and other baseline algorithms of "Towards Unified Alignment Between Agents, Humans, and Environme…☆19Nov 12, 2024Updated last year
- This repository aims to collect Transformer-based sound event detection (SED) algorithms.☆93Feb 10, 2026Updated 3 weeks ago
- Official repository of paper "Context-DPO: Aligning Language Models for Context-Faithfulness"☆21Feb 17, 2025Updated last year
- The first dense retrieval model that can be prompted like an LM☆90May 8, 2025Updated 9 months ago
- A scalable automated alignment method for large language models. Resources for "Aligning Large Language Models via Self-Steering Optimiza…☆20Nov 21, 2024Updated last year
- ☆32Oct 13, 2025Updated 4 months ago
- ☆46Jun 11, 2025Updated 8 months ago
- Code, Data and Red Teaming for ZeroBench☆54Dec 23, 2025Updated 2 months ago