facebookresearch / BigOBenchLinks

BigOBench assesses the capacity of Large Language Models (LLMs) to comprehend time-space computational complexity of input or generated code.

☆35

Alternatives and similar repositories for BigOBench

Users that are interested in BigOBench are comparing it to the libraries listed below

Sorting:

ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated 10 months ago
sunblaze-ucb / reasoning_ladder
☆33Updated 2 months ago
sail-sg / SkyLadder
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆33Updated 4 months ago
MLE-Dojo / MLE-Dojo
☆55Updated 3 weeks ago
AlexCuadron / ThinkingAgent
Systematic evaluation framework that automatically rates overthinking behavior in large language models.
☆91Updated 2 months ago
LAMDASZ-ML / Self-Backtracking
☆47Updated 5 months ago
convergence-ai / lm2
Official repo of paper LM2
☆41Updated 5 months ago
complex-reasoning / RPG
The official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)
☆35Updated last week
OpenMOSS / Lorsa
☆23Updated last month
bigcode-project / astraios
Astraios: Parameter-Efficient Instruction Tuning Code Language Models
☆58Updated last year
shangshang-wang / Resa
Resa: Transparent Reasoning Models via SAEs
☆40Updated last month
amazon-science / llm-code-preference
Training and Benchmarking LLMs for Code Preference.
☆33Updated 8 months ago
cmu-l3 / neurips2024-inference-tutorial-code
NeurIPS 2024 tutorial on LLM Inference
☆45Updated 7 months ago
menhguin / minp_paper
Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper
☆38Updated 4 months ago
efficientscaling / Z1
Repo for "Z1: Efficient Test-time Scaling with Code"
☆63Updated 3 months ago
yidingjiang / ado
The repository contains code for Adaptive Data Optimization
☆25Updated 7 months ago
SparkJiao / StructTest
☆19Updated 4 months ago
open-compass / GPassK
[ACL 2025] Are Your LLMs Capable of Stable Reasoning?
☆27Updated 4 months ago
john-hewitt / implicit-ins
Codebase for Instruction Following without Instruction Tuning
☆35Updated 9 months ago
Shalev-Lifshitz / MultiAgentVerification
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
☆19Updated 4 months ago
RobertCsordas / moeut
☆82Updated 11 months ago
sunblaze-ucb / math_ood
☆34Updated 3 weeks ago
dinobby / MAGDi
The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…
☆35Updated last year
SalesforceAIResearch / swecomm
☆27Updated 6 months ago
ahans30 / goldfish-loss
[NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs
☆90Updated 8 months ago
shenao-zhang / SELM
The official implementation of Self-Exploring Language Models (SELM)
☆64Updated last year
katiekang1998 / reasoning_generalization
☆33Updated 6 months ago
VITA-Group / WeLore
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…
☆47Updated 3 months ago
mukhal / ThinkPRM
Process Reward Models That Think
☆46Updated 2 weeks ago
formll / resolving-scaling-law-discrepancies
☆20Updated last year