Humanity's Last Exam
☆1,521Feb 20, 2026Updated 2 months ago
Alternatives and similar repositories for hle
Users that are interested in hle are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Exploration: using technology to aid people who lack both the ability to speak and fine motor control.☆21Oct 24, 2024Updated last year
- LiveBench: A Challenging, Contamination-Free LLM Benchmark☆1,160Updated this week
- ☆4,471Apr 22, 2026Updated 2 weeks ago
- An experimental desktop client for using Claude Desktop's MCP with Novelcrafter codices.☆11Dec 3, 2024Updated last year
- open-llms-next-web,一个类似于chatgpt-next-web的开源大型语言模型web演示,支持离线开源大模型和PEFT模型☆18May 13, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- GPQA: A Graduate-Level Google-Proof Q&A Benchmark☆498Sep 30, 2024Updated last year
- 基于chatgpt-next-web 增强版本,后台管理,接入知识库等。将按需持续接入midjourney绘画功能,接入了stable-diffusion,支持oss,支持dall-e-3、gpt-4-vision-preview、whisper、tts,支持gpt-4-a…☆38May 4, 2024Updated 2 years ago
- A benchmark that challenges language models to code solutions for scientific problems☆194Apr 27, 2026Updated last week
- ☆1,135Jan 10, 2026Updated 3 months ago
- SWE-bench: Can Language Models Resolve Real-world Github Issues?☆4,831Apr 1, 2026Updated last month
- A Comprehensive survey on business use cases of AI that help them thrive in the digital economy☆13Oct 7, 2020Updated 5 years ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆857Jul 16, 2025Updated 9 months ago
- Arena-Hard-Auto: An automatic LLM benchmark.☆1,016Jun 21, 2025Updated 10 months ago
- ☆14Apr 26, 2025Updated last year
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- This is a AUTOSAR documents specific retriever based on LLM and RAG.☆16Nov 12, 2024Updated last year
- By leveraging Bocha AI Search API , your AI applications can now access high-quality, up-to-date knowledge from billions of web pages and…☆21Feb 9, 2025Updated last year
- s1: Simple test-time scaling☆6,650Jun 25, 2025Updated 10 months ago
- 本项目主要对开源的MOSS SFT数据进行整理 ,转换成mnbvc多轮对话格式。MOSS-003涵盖用性、忠实性、无害性三个层面,共353w样本,MOSS-003 包含更细粒度的有用性类别标记、更广泛的无害性数据和更长对话轮数,共630w样本,☆13Dec 3, 2023Updated 2 years ago
- ☆60Apr 2, 2025Updated last year
- The rule-based evaluation subset and code implementation of Omni-MATH☆27Dec 23, 2024Updated last year
- Fully open reproduction of DeepSeek-R1☆26,013Apr 2, 2026Updated last month
- A repository for the BCIO ontology☆43Updated this week
- A new dataset of difficult graduate-level applied mathematics problems; evaluations demonstrate that leading LLMs currently exhibit low a…☆28Feb 14, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- The free energy principle☆19Feb 16, 2025Updated last year
- ☆13Jul 14, 2024Updated last year
- Datasets and functions for the Handbook of Educational Measurement and Psychometrics using R.☆24Apr 2, 2021Updated 5 years ago
- AllenAI's post-training codebase☆3,708Updated this week
- Minimal reproduction of DeepSeek R1-Zero☆13,079Feb 27, 2026Updated 2 months ago
- Course Materials for Bayesian Psychometric Modeling☆15May 14, 2019Updated 6 years ago
- Simple RL training for reasoning☆3,851Dec 23, 2025Updated 4 months ago
- ☆185Apr 30, 2025Updated last year
- A framework for few-shot evaluation of language models.☆12,411Updated this week
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…☆9,441Updated this week
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆1,520Apr 24, 2026Updated last week
- yet another chatbot☆34Oct 9, 2025Updated 6 months ago
- NextChat mcp server collection☆30Jan 14, 2025Updated last year
- This GUI aims to simplify the process of converting GGUF files to llamafile format by providing an intuitive and convenient way for users…☆14Jan 2, 2026Updated 4 months ago
- Benchmarking Benchmark Leakage in Large Language Models☆60May 20, 2024Updated last year
- [COLM 2025] LIMO: Less is More for Reasoning☆1,073Jul 30, 2025Updated 9 months ago