microsoft / MMLU-CFLinks
A Contamination-free Multi-task Language Understanding Benchmark [Official, ACL 2025]
☆117Updated last month
Alternatives and similar repositories for MMLU-CF
Users that are interested in MMLU-CF are comparing it to the libraries listed below
Sorting:
- Hybrid Latent Reasoning via Reinforcement Learning☆135Updated last month
- The repository for the paper titled "Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks"☆157Updated 6 months ago
- SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation☆55Updated 2 months ago
- Collecting personality-indicative data for role-playing agents.☆23Updated 4 months ago
- [EMNLP 2024] DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models☆70Updated last month
- Official code of "StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs".☆69Updated 3 weeks ago
- SQL-o1: A Self-Reward Heuristic Dynamic Search Method for Text-to-SQL☆185Updated last month
- A general AI agent framework that can be adapted to various tasks and environments.☆100Updated 5 months ago
- Official implementation of RARE: Retrieval-Augmented Reasoning Modeling☆184Updated last month
- [ACL 2025 Findings] MegaAgent: A Large-Scale Autonomous LLM-based Multi-Agent System Without Predefined SOPs https://arxiv.org/abs/2408.0…☆77Updated last month
- Search and Refine During Think: Autonomous Retrieval‑Augmented Reasoning of LLMs☆83Updated 2 weeks ago
- RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response☆42Updated 6 months ago
- Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning☆75Updated 2 months ago
- LLM Benchmark for Code☆30Updated 11 months ago
- Image and video Tokenizer/VAE selection guide, text and face reconstruction evaluation.☆95Updated last month
- MPLSandbox is an out-of-the-box multi-programming language sandbox designed to provide unified and comprehensive feedback from compiler a…☆178Updated 3 months ago
- ☆44Updated 2 months ago
- (NeurIPS 2024) Official PyTorch implementation of LOVA3☆89Updated 3 months ago
- A Contextual RAG Bot Framework☆80Updated 8 months ago
- [AAAI 2025] Code for paper:Enhancing Multimodal Large Language Models Complex Reasoning via Similarity Computation☆3Updated 6 months ago
- [NeurIPS'24] Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation☆59Updated 7 months ago
- [ECCV 2024] Efficient Inference of Vision Instruction-Following Models with Elastic Cache☆42Updated 11 months ago
- [BIRD-INTERACT] Re-imagines Text-to-SQL evaluation via lens of dynamic interactions.☆110Updated last week
- Official Implementation of AttentionShift: Iteratively Estimated Part-based Attention Map for Pointly Supervised Instance Segmentation☆158Updated 8 months ago
- Official Code of Logits-Based-Finetuning☆86Updated last month
- [ICCV 2025] Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer☆121Updated last week
- SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models☆40Updated 4 months ago
- Framework exploring ergonomic, lightweight multi-agent orchestration.☆118Updated last week
- Domain-Controlled Prompt Learning (AAAI2024)☆90Updated 7 months ago
- Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models☆176Updated 8 months ago