Humanity's Last Exam
☆1,544Feb 20, 2026Updated 3 months ago
Alternatives and similar repositories for hle
Users that are interested in hle are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Exploration: using technology to aid people who lack both the ability to speak and fine motor control.☆21Oct 24, 2024Updated last year
- LiveBench: A Challenging, Contamination-Free LLM Benchmark☆1,175May 19, 2026Updated last week
- An experimental desktop client for using Claude Desktop's MCP with Novelcrafter codices.☆11Dec 3, 2024Updated last year
- ☆4,492Apr 22, 2026Updated last month
- open-llms-next-web,一个类似于chatgpt-next-web的开源大型语言模型web演示,支持离线开源大模型和PEFT模型☆18May 13, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- GPQA: A Graduate-Level Google-Proof Q&A Benchmark☆501Sep 30, 2024Updated last year
- 基于chatgpt-next-web 增强版本,后台管理,接入知识库等。将按需持续接入midjourney绘画功能,接入了stable-diffusion,支持oss,支持dall-e-3、gpt-4-vision-preview、whisper、tts,支持gpt-4-a…☆38May 4, 2024Updated 2 years ago
- A benchmark that challenges language models to code solutions for scientific problems☆198May 18, 2026Updated last week
- ☆1,144Jan 10, 2026Updated 4 months ago
- SWE-bench: Can Language Models Resolve Real-world Github Issues?☆5,006Apr 1, 2026Updated last month
- A Comprehensive survey on business use cases of AI that help them thrive in the digital economy☆13Oct 7, 2020Updated 5 years ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆873Jul 16, 2025Updated 10 months ago
- Arena-Hard-Auto: An automatic LLM benchmark.☆1,023Jun 21, 2025Updated 11 months ago
- ☆14Apr 26, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- This is a AUTOSAR documents specific retriever based on LLM and RAG.☆16Nov 12, 2024Updated last year
- By leveraging Bocha AI Search API , your AI applications can now access high-quality, up-to-date knowledge from billions of web pages and…☆21Feb 9, 2025Updated last year
- s1: Simple test-time scaling☆6,655Jun 25, 2025Updated 11 months ago
- 本项目主要对开源的MOSS SFT数据进行整理 ,转换成mnbvc多轮对话格式。MOSS-003涵盖用性、忠实性、无害性三个层面,共353w样本,MOSS-003 包含更细粒度的有用性类别标记、更广泛的无害性数据和更长对话轮数,共630w样本,☆13Dec 3, 2023Updated 2 years ago
- The rule-based evaluation subset and code implementation of Omni-MATH☆27Dec 23, 2024Updated last year
- Fully open reproduction of DeepSeek-R1☆26,020Apr 2, 2026Updated last month
- A repository for the BCIO ontology☆43May 13, 2026Updated 2 weeks ago
- A new dataset of difficult graduate-level applied mathematics problems; evaluations demonstrate that leading LLMs currently exhibit low a…☆28Feb 14, 2025Updated last year
- The free energy principle☆19Feb 16, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆13Jul 14, 2024Updated last year
- Datasets and functions for the Handbook of Educational Measurement and Psychometrics using R.☆24Apr 2, 2021Updated 5 years ago
- AllenAI's post-training codebase☆3,729Updated this week
- Course Materials for Bayesian Psychometric Modeling☆15May 14, 2019Updated 7 years ago
- Minimal reproduction of DeepSeek R1-Zero☆13,104Feb 27, 2026Updated 3 months ago
- Simple RL training for reasoning☆3,859Dec 23, 2025Updated 5 months ago
- ☆187Apr 30, 2025Updated last year
- A framework for few-shot evaluation of language models.☆12,678May 11, 2026Updated 2 weeks ago
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…☆9,548Updated this week
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆1,537Apr 24, 2026Updated last month
- yet another chatbot☆34Oct 9, 2025Updated 7 months ago
- NextChat mcp server collection☆30Jan 14, 2025Updated last year
- This GUI aims to simplify the process of converting GGUF files to llamafile format by providing an intuitive and convenient way for users…☆14Jan 2, 2026Updated 4 months ago
- Benchmarking Benchmark Leakage in Large Language Models☆60May 20, 2024Updated 2 years ago
- [COLM 2025] LIMO: Less is More for Reasoning☆1,077Jul 30, 2025Updated 9 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆80,418May 19, 2026Updated last week