SihyeongPark / Awesome-LLM-BenchmarkLinks
Awesome-LLM-Benchmark: List of benchmarks for Large-Language Models
☆9Updated 2 years ago
Alternatives and similar repositories for Awesome-LLM-Benchmark
Users that are interested in Awesome-LLM-Benchmark are comparing it to the libraries listed below
Sorting:
- implementation of dualformer☆17Updated 3 months ago
- A Data Source for Reasoning Embodied Agents☆19Updated last year
- On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability☆39Updated 2 months ago
- MPI Code Generation through Domain-Specific Language Models☆14Updated 7 months ago
- ☆21Updated 3 months ago
- Plancraft is a minecraft environment and agent suite to test planning capabilities in LLMs☆15Updated 2 weeks ago
- Lottery Ticket Adaptation☆39Updated 7 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆34Updated last year
- A testbed for agents and environments that can automatically improve models through data generation.☆24Updated 3 months ago
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆16Updated last year
- Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges☆14Updated last month
- ☆27Updated 2 years ago
- Experimental scripts for researching data adaptive learning rate scheduling.☆23Updated last year
- LLM Dynamic Planner - Combining LLM with PDDL Planners to solve an embodied task☆43Updated 5 months ago
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆34Updated 11 months ago
- Resa: Transparent Reasoning Models via SAEs☆36Updated 2 weeks ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆18Updated last year
- ☆28Updated last year
- Byte-sized text games for code generation tasks on virtual environments☆19Updated 11 months ago
- Official Repository for Task-Circuit Quantization☆20Updated 3 weeks ago
- Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers