Using LLM to evaluate MMLU dataset.
☆42Mar 8, 2024Updated 2 years ago
Alternatives and similar repositories for llm_evaluation_4_mmlu
Users that are interested in llm_evaluation_4_mmlu are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for our paper "Decomposing The Dark Matter of Sparse Autoencoders"☆23Feb 6, 2025Updated last year
- [NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning☆11Oct 29, 2024Updated last year
- Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments (Zhou et al., EMNLP 2024)☆14Oct 3, 2024Updated last year
- DevKit for SoccerNet Team Action Spotting Challenge 2025☆18Aug 26, 2025Updated 7 months ago
- [arxiv: 2512.19673] Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies☆60Feb 6, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Chinese Generation Evaluation☆13Aug 14, 2023Updated 2 years ago
- ☆43Nov 1, 2022Updated 3 years ago
- This is the official repo for "Differentiable Model Scaling using Differentiable Topk"☆12May 16, 2024Updated last year
- ☆20May 28, 2025Updated 10 months ago
- Official Implementation of Avoiding spurious correlations via logit correction☆17May 6, 2023Updated 2 years ago
- Multi-dimensional analysis of orthogonal safety directions in LLM alignment☆21Mar 20, 2025Updated last year
- [NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…☆16Feb 4, 2025Updated last year
- a website for accessing many models through api(deepseek、Qwen、Hunyuan etc.)☆16Jul 12, 2025Updated 8 months ago
- 亚博智能 Jetson Orin NX 课程资料文档个人汉化☆17Nov 7, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- A Paper List for Geo-localization Research☆16Sep 2, 2024Updated last year
- Official implementation of "Diffusion Language Models Know the Answer Before Decoding"☆49Sep 8, 2025Updated 6 months ago
- ToolBridge: An Open-Source Dataset to Equip LLMs with External Tool Capabilities☆15Feb 11, 2025Updated last year
- Reinforcement Learning Toolkit for RWKV.(v6,v7,ARWKV) Distillation,SFT,RLHF(DPO,ORPO), infinite context training, Aligning. Exploring the…☆63Sep 19, 2025Updated 6 months ago
- JPEG Compression RTL implementation☆11Aug 19, 2017Updated 8 years ago
- 2022 秋季学期清华大学电子系数据与算法课程 OJ 参考解答☆10Jun 18, 2023Updated 2 years ago
- ☆15Dec 26, 2017Updated 8 years ago
- Adapter board exposing SATA M.2 SSD on FMC board-to-board connector☆15Aug 7, 2023Updated 2 years ago
- Improved version of http://web.mit.edu/6.111/volume2/www/f2018/tools/sd_controller.v☆13Dec 6, 2021Updated 4 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆12Jul 20, 2022Updated 3 years ago
- [NIPS 25'] Evaluation code of paper "KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models"☆40Oct 19, 2025Updated 5 months ago
- Time series data contribution via influence functions☆17Jan 18, 2025Updated last year
- 🤖ConvRe🤯: An Investigation of LLMs’ Inefficacy in Understanding Converse Relations (EMNLP 2023)☆24Oct 10, 2023Updated 2 years ago
- 2022年龙芯杯个人赛 单发射110M(含icache)☆48Aug 22, 2022Updated 3 years ago
- A JPEG-LS plugin for the Python Pillow library☆16Dec 31, 2023Updated 2 years ago
- ☆19Jan 3, 2025Updated last year
- Code and dataset for the paper "Text2City: One-Stage Text-Driven Urban Layout Regeneration"☆14Jun 27, 2024Updated last year
- Open Source SSD Controller. NVMe and Lightstor variants☆17May 21, 2014Updated 11 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Official implementation of "Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought" (NeurIPS 2025)☆38Oct 8, 2025Updated 5 months ago
- Code of ImageNet training and evaluation for the paper: RENAS: Reinforced Evolutionary Neural Architecture Search☆20May 15, 2019Updated 6 years ago
- ☆22Mar 19, 2024Updated 2 years ago
- 2022龙芯杯个人赛三等奖作品☆14Oct 11, 2023Updated 2 years ago
- HANNA, a large annotated dataset of Human-ANnotated NArratives for ASG evaluation.☆35Oct 15, 2024Updated last year
- Vitis 部署加速器工作流介绍☆12Jan 10, 2025Updated last year
- Code of EMNLP 2025 paper 'UltraIF: Advancing Instruction Following from the Wild'.☆21Apr 3, 2025Updated 11 months ago