Humanity's Last Exam
☆1,481Feb 20, 2026Updated last month
Alternatives and similar repositories for hle
Users that are interested in hle are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Exploration: using technology to aid people who lack both the ability to speak and fine motor control.☆22Oct 24, 2024Updated last year
- LiveBench: A Challenging, Contamination-Free LLM Benchmark☆1,126Apr 9, 2026Updated last week
- ☆4,436Jul 31, 2025Updated 8 months ago
- An experimental desktop client for using Claude Desktop's MCP with Novelcrafter codices.☆10Dec 3, 2024Updated last year
- open-llms-next-web,一个类似于chatgpt-next-web的开源大型语言模型web演示,支持离线开源大模型和PEFT模型☆18May 13, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- GPQA: A Graduate-Level Google-Proof Q&A Benchmark☆484Sep 30, 2024Updated last year
- 基于chatgpt-next-web 增强版本,后台管理,接入知识库等。将按需持续接入midjourney绘画功能,接入了stable-diffusion,支持oss,支持dall-e-3、gpt-4-vision-preview、whisper、tts,支持gpt-4-a…☆38May 4, 2024Updated last year
- A benchmark that challenges language models to code solutions for scientific problems☆186Apr 6, 2026Updated last week
- ☆1,126Jan 10, 2026Updated 3 months ago
- SWE-bench: Can Language Models Resolve Real-world Github Issues?☆4,676Apr 1, 2026Updated 2 weeks ago
- A Comprehensive survey on business use cases of AI that help them thrive in the digital economy☆13Oct 7, 2020Updated 5 years ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆841Jul 16, 2025Updated 9 months ago
- Arena-Hard-Auto: An automatic LLM benchmark.☆1,015Jun 21, 2025Updated 9 months ago
- ☆14Apr 26, 2025Updated 11 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- This is a AUTOSAR documents specific retriever based on LLM and RAG.☆16Nov 12, 2024Updated last year
- By leveraging Bocha AI Search API , your AI applications can now access high-quality, up-to-date knowledge from billions of web pages and…☆21Feb 9, 2025Updated last year
- s1: Simple test-time scaling☆6,640Jun 25, 2025Updated 9 months ago
- 本项目主要对开源的MOSS SFT数据进行整理 ,转换成mnbvc多轮对话格式。MOSS-003涵盖用性、忠实性、无害性三个层面,共353w样本,MOSS-003 包含更细粒度的有用性类别标记、更广泛的无害性数据和更长对话轮数,共630w样本,☆12Dec 3, 2023Updated 2 years ago
- ☆60Apr 2, 2025Updated last year
- The rule-based evaluation subset and code implementation of Omni-MATH☆27Dec 23, 2024Updated last year
- Fully open reproduction of DeepSeek-R1☆25,973Apr 2, 2026Updated 2 weeks ago
- A repository for the BCIO ontology☆42Mar 23, 2026Updated 3 weeks ago
- A new dataset of difficult graduate-level applied mathematics problems; evaluations demonstrate that leading LLMs currently exhibit low a…☆28Feb 14, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- The free energy principle☆19Feb 16, 2025Updated last year
- ☆13Jul 14, 2024Updated last year
- Datasets and functions for the Handbook of Educational Measurement and Psychometrics using R.☆24Apr 2, 2021Updated 5 years ago
- AllenAI's post-training codebase☆3,683Updated this week
- Minimal reproduction of DeepSeek R1-Zero☆13,038Feb 27, 2026Updated last month
- Course Materials for Bayesian Psychometric Modeling☆15May 14, 2019Updated 6 years ago
- Simple RL training for reasoning☆3,846Dec 23, 2025Updated 3 months ago
- ☆186Apr 30, 2025Updated 11 months ago
- A framework for few-shot evaluation of language models.☆12,138Apr 8, 2026Updated last week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…☆9,340Updated this week
- yet another chatbot☆34Oct 9, 2025Updated 6 months ago
- NextChat mcp server collection☆30Jan 14, 2025Updated last year
- This GUI aims to simplify the process of converting GGUF files to llamafile format by providing an intuitive and convenient way for users…☆14Jan 2, 2026Updated 3 months ago
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆1,453Mar 20, 2026Updated 3 weeks ago
- Benchmarking Benchmark Leakage in Large Language Models☆60May 20, 2024Updated last year
- [COLM 2025] LIMO: Less is More for Reasoning☆1,071Jul 30, 2025Updated 8 months ago