Using LLM to evaluate MMLU dataset.
☆42Mar 8, 2024Updated 2 years ago
Alternatives and similar repositories for llm_evaluation_4_mmlu
Users that are interested in llm_evaluation_4_mmlu are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Measuring Massive Multitask Language Understanding | ICLR 2021☆15May 28, 2023Updated 3 years ago
- [NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning☆11Oct 29, 2024Updated last year
- [CVPR 2026] FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection☆35Jun 7, 2026Updated last week
- [arxiv: 2512.19673] Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies☆60Feb 6, 2026Updated 4 months ago
- ☆45Nov 1, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- This is the official repo for "Differentiable Model Scaling using Differentiable Topk"☆12May 16, 2024Updated 2 years ago
- Mamba support for transformer lens☆20Sep 17, 2024Updated last year
- ☆20May 28, 2025Updated last year
- Official Implementation of Avoiding spurious correlations via logit correction☆17May 6, 2023Updated 3 years ago
- Multi-dimensional analysis of orthogonal safety directions in LLM alignment☆22Jun 12, 2026Updated last week
- Official Implementation of NIPS 2022 paper Pre-activation Distributions Expose Backdoor Neurons☆15Jan 13, 2023Updated 3 years ago
- Text generation using language models with multiple exit heads☆16Sep 18, 2025Updated 9 months ago
- Trains Sparse Autoencoders based on outputs from language models☆11Oct 7, 2024Updated last year
- One-Class Convolutional Neural Network pytorch实现,后续还会继续优化!!!!☆13Oct 27, 2022Updated 3 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ToolBridge: An Open-Source Dataset to Equip LLMs with External Tool Capabilities☆15Feb 11, 2025Updated last year
- a website for accessing many models through api(deepseek、Qwen、Hunyuan etc.)☆16Jul 12, 2025Updated 11 months ago
- 2022 秋季学期清华大学电子系数据与算法课程 OJ 参考解答☆10Jun 18, 2023Updated 3 years ago
- Official implementation of Latent-SFT: teaching LLMs to reason with vocabulary-space latent chains.☆51May 18, 2026Updated last month
- ☆22Dec 23, 2024Updated last year
- ☆15Jan 11, 2019Updated 7 years ago
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types☆26Nov 29, 2024Updated last year
- ☆19Jan 3, 2025Updated last year
- 2022年龙芯杯个人赛 单发射110M(含icache)☆52Aug 22, 2022Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Official implementation of "Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought" (NeurIPS 2025)☆40Oct 8, 2025Updated 8 months ago
- ☆11Apr 2, 2024Updated 2 years ago
- multi-bit language model watermarking (NAACL 24)☆19Sep 20, 2024Updated last year
- Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits☆45Jan 8, 2026Updated 5 months ago
- HANNA, a large annotated dataset of Human-ANnotated NArratives for ASG evaluation.☆37Oct 15, 2024Updated last year
- GLFriend is a little function based on the AppleScript.☆21Aug 27, 2024Updated last year
- [ICLR 2026 🔥] Dr.LLM: Dynamic Layer Routing in LLMs☆53Apr 24, 2026Updated last month
- ☆18Mar 15, 2021Updated 5 years ago
- [ICML2025] Official Repo for Paper "Optimizing Temperature for Language Models with Multi-Sample Inference"☆23Feb 16, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Co1aSQL - 数据库管理系统☆14Apr 2, 2024Updated 2 years ago
- Repository for awesome spatial/visual reasoning MLLMs. (focus more on embodied applications)☆70Jun 26, 2025Updated 11 months ago
- ☆15Nov 18, 2025Updated 7 months ago
- multicast learning in network programming course☆10Oct 30, 2020Updated 5 years ago
- An Ultra-Long Output Reinforcement Learning Approach☆23Jul 31, 2025Updated 10 months ago
- [ACL 2026] Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments☆52Apr 6, 2026Updated 2 months ago
- ☆73Apr 1, 2026Updated 2 months ago