GAIR-NLP/AIME-Preview

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/GAIR-NLP/AIME-Preview)

GAIR-NLP / AIME-Preview

☆84

Alternatives and similar repositories for AIME-Preview

Users that are interested in AIME-Preview are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yale-nlp / InstruSum
View on GitHub
☆23Feb 26, 2024Updated 2 years ago
GAIR-NLP / OlympicArena
View on GitHub
[NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
☆106Mar 6, 2025Updated last year
kyegomez / Reka-Torch
View on GitHub
Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch
☆29Updated this week
KbsdJames / omni-math-rule
View on GitHub
The rule-based evaluation subset and code implementation of Omni-MATH
☆28Dec 23, 2024Updated last year
eth-sri / matharena
View on GitHub
Evaluation of LLMs on latest math competitions
☆273Jun 23, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Michaelvll / llm-ie-benchmarks
View on GitHub
A collection of reproducible inference engine benchmarks
☆38Apr 22, 2025Updated last year
GAIR-NLP / DataEvolve
View on GitHub
☆31Mar 15, 2026Updated 4 months ago
GAIR-NLP / self-improvement-reversal
View on GitHub
☆13Jul 14, 2024Updated 2 years ago
longrongyang / STGC
View on GitHub
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
☆13Feb 11, 2025Updated last year
liziniu / cold_start_rl
View on GitHub
Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?
☆20Mar 9, 2025Updated last year
GAIR-NLP / BeHonest
View on GitHub
BeHonest: Benchmarking Honesty in Large Language Models
☆35Aug 15, 2024Updated last year
BryceZhuo / HybridNorm
View on GitHub
The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
☆19Mar 7, 2025Updated last year
GAIR-NLP / LIMO
View on GitHub
[COLM 2025] LIMO: Less is More for Reasoning
☆1,080Jul 30, 2025Updated 11 months ago
SPIRAL-MED / Ophiuchus
View on GitHub
☆41Jan 14, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
shawnricecake / search-llm
View on GitHub
[NeurIPS 2024] Search for Efficient LLMs
☆16Jan 16, 2025Updated last year
brendanhogan / completion_tree_view
View on GitHub
☆15Apr 26, 2025Updated last year
GAIR-NLP / ReasonEval
View on GitHub
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆80Oct 9, 2025Updated 9 months ago
GAIR-NLP / Med
View on GitHub
[ICML 2026] What Does Vision Tool-Use Reinforcement Learning Really Learn? Disentangling Tool-Induced and Intrinsic Effects for Crop-and-…
☆22May 15, 2026Updated 2 months ago
zhitao-wang / Sequential-Neural-Information-Diffusion-Model-with-Structure-Attention
View on GitHub
Code for A Sequential Neural Information Diffusion Model with Structure Attention (CIKM 2018)
☆18Jan 4, 2019Updated 7 years ago
InfiXAI / InfiGUIAgent
View on GitHub
☆74May 23, 2025Updated last year
Egg-Hu / SMI
View on GitHub
[ICML 2024] Sparse Model Inversion: Efficient Inversion of Vision Transformers with Less Hallucination
☆14Apr 29, 2025Updated last year
guy120494 / SUMO
View on GitHub
☆15Feb 5, 2026Updated 5 months ago
LLM360 / TxT360
View on GitHub
☆25Dec 18, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
T-Lab-CUHKSZ / G2RPO-A
View on GitHub
[ACL 2026] G2RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance
☆16May 20, 2026Updated 2 months ago
dmis-lab / Outlier-Safe-Pre-Training
View on GitHub
[ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
☆39Nov 4, 2025Updated 8 months ago
GAIR-NLP / LIMR
View on GitHub
☆221Feb 20, 2025Updated last year
yiqingxyq / RepoST
View on GitHub
Code for "[COLM'25] RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing"
☆24Mar 18, 2025Updated last year
GAIR-NLP / MathPile
View on GitHub
[NeurlPS D&B 2024] Generative AI for Math: MathPile
☆418Apr 4, 2025Updated last year
iesl / CSFCube
View on GitHub
A Test Collection of Computer Science Papers for Faceted Query by Example
☆23Nov 28, 2021Updated 4 years ago
facebookresearch / SecAlign
View on GitHub
Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"
☆98Jul 2, 2026Updated 3 weeks ago
BAI-LAB / MoE-CL
View on GitHub
[WWW 2026 Oral] MoE-CL:Self-Evolving LLMs via Continual Instruction Tuning
☆21Dec 1, 2025Updated 7 months ago
ZurichNLP / understanding-mbr
View on GitHub
☆17Apr 28, 2022Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
idavidrein / gpqa
View on GitHub
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
☆523Sep 30, 2024Updated last year
beichenzbc / BoostStep
View on GitHub
official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"
☆37Jan 21, 2025Updated last year
KbsdJames / Omni-MATH
View on GitHub
The official repository of the Omni-MATH benchmark.
☆94Dec 22, 2024Updated last year
GAIR-NLP / lm-open-science-evaluation
View on GitHub
Reproducible and flexible LLM evaluations for scientific reasoning.
☆29Jul 23, 2025Updated last year
AI45Lab / Flames
View on GitHub
Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.
☆68May 21, 2024Updated 2 years ago
guijinSON / MM-Eval
View on GitHub
Official implementation for "MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models"
☆20Oct 26, 2024Updated last year
mlfoundations / evalchemy
View on GitHub
Automatic evals for LLMs
☆601Feb 24, 2026Updated 5 months ago