SihyeongPark / Awesome-LLM-BenchmarkLinks
Awesome-LLM-Benchmark: List of benchmarks for Large-Language Models
☆9Updated 2 years ago
Alternatives and similar repositories for Awesome-LLM-Benchmark
Users that are interested in Awesome-LLM-Benchmark are comparing it to the libraries listed below
Sorting:
- A Data Source for Reasoning Embodied Agents☆19Updated last year
- Lottery Ticket Adaptation☆39Updated 7 months ago
- A testbed for agents and environments that can automatically improve models through data generation.☆24Updated 4 months ago
- LLM Dynamic Planner - Combining LLM with PDDL Planners to solve an embodied task☆44Updated 6 months ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated last year
- ☆20Updated 4 months ago
- ☆27Updated 2 weeks ago
- GPT as Knowledger Worker (or if you really want, GPT Sorta' Takes the CPA Exam)☆12Updated 2 years ago
- ☆21Updated 8 months ago
- Plancraft is a minecraft environment and agent suite to test planning capabilities in LLMs☆15Updated last week
- Experimental scripts for researching data adaptive learning rate scheduling.☆23Updated last year
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆24Updated 2 weeks ago
- implementation of dualformer☆18Updated 4 months ago
- [EMNLP 2024] Tree of Problems: Improving structured problem solving with compositionality☆19Updated 4 months ago
- Repository for Skill Set Optimization☆14Updated 11 months ago
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆39Updated 8 months ago
- The tool to read/get/extract and write/change/modify BIOS/UEFI settings from Linux terminal.☆6Updated last year
- this is for fun, ain't it grand!☆20Updated 2 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆35Updated last year
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated last year
- ☆44Updated last month
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆16Updated last year
- ☆13Updated 4 months ago
- [NAACL 2024 Findings] Evaluation suite for the systematic evaluation of instruction selection methods.☆22Updated last year
- a benchmark to evaluate the situated inductive reasoning☆16Updated 6 months ago
- Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers☆19Updated 4 months ago
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆43Updated last year
- Fork of Flame repo for training of some new stuff in development☆14Updated last week
- Official Code Release for "Training a Generally Curious Agent"☆28Updated last month
- official repo of AAAI2024 paper Mitigating the Impact of False Negatives in Dense Retrieval with Contrastive Confidence Regularization☆13Updated last year