centerforaisafety / hleLinks
Humanity's Last Exam
☆1,098Updated last month
Alternatives and similar repositories for hle
Users that are interested in hle are comparing it to the libraries listed below
Sorting:
- ☆1,187Updated 2 months ago
- LiveBench: A Challenging, Contamination-Free LLM Benchmark☆873Updated this week
- Releases from OpenAI Preparedness☆860Updated 3 weeks ago
- ☆2,335Updated 2 weeks ago
- ☆477Updated 2 months ago
- This repo contains the dataset and code for the paper "SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software E…☆1,436Updated 2 months ago
- open source interpretability platform 🧠☆398Updated this week
- GPQA: A Graduate-Level Google-Proof Q&A Benchmark☆408Updated 11 months ago
- Renderer for the harmony response format to be used with gpt-oss☆3,774Updated last month
- ☆1,233Updated this week
- A benchmark for LLMs on complicated tasks in the terminal☆691Updated this week
- An AI agent system for solving International Mathematical Olympiad (IMO) problems using Google's Gemini, OpenAI, and XAI APIs.☆750Updated 3 weeks ago
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆957Updated this week
- Textbook on reinforcement learning from human feedback☆1,221Updated this week
- Training Large Language Model to Reason in a Continuous Latent Space☆1,265Updated last month
- Testing baseline LLMs performance across various models☆307Updated last month
- Pretraining and inference code for a large-scale depth-recurrent language model☆827Updated last week
- Dream 7B, a large diffusion language model☆970Updated 3 weeks ago
- Atom of Thoughts for Markov LLM Test-Time Scaling☆586Updated 3 months ago
- ☆226Updated 2 months ago
- Arena-Hard-Auto: An automatic LLM benchmark.☆925Updated 2 months ago
- Sky-T1: Train your own O1 preview model within $450☆3,327Updated 2 months ago
- Self-Adapting Language Models☆785Updated last month
- [COLM 2025] LIMO: Less is More for Reasoning☆1,018Updated last month
- Large Concept Models: Language modeling in a sentence representation space☆2,280Updated 7 months ago
- Darwin Gödel Machine: Open-Ended Evolution of Self-Improving Agents☆1,656Updated last month
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"