SihyeongPark / Awesome-LLM-BenchmarkLinks

Awesome-LLM-Benchmark: List of benchmarks for Large-Language Models

☆9

Alternatives and similar repositories for Awesome-LLM-Benchmark

Users that are interested in Awesome-LLM-Benchmark are comparing it to the libraries listed below

Sorting:

facebookresearch / NeuralMemory
A Data Source for Reasoning Embodied Agents
☆19Updated last year
kiddyboots216 / lottery-ticket-adaptation
Lottery Ticket Adaptation
☆39Updated 7 months ago
codezakh / DataEnvGym
A testbed for agents and environments that can automatically improve models through data generation.
☆24Updated 4 months ago
itl-ed / llm-dp
LLM Dynamic Planner - Combining LLM with PDDL Planners to solve an embodied task
☆44Updated 6 months ago
kumar-shridhar / Screws
SCREWS: A Modular Framework for Reasoning with Revisions
☆27Updated last year
katzurik / Knowledge_Navigator
☆20Updated 4 months ago
austrian-code-wizard / c3po
☆27Updated 2 weeks ago
mjbommar / gpt-as-knowledge-worker
GPT as Knowledger Worker (or if you really want, GPT Sorta' Takes the CPA Exam)
☆12Updated 2 years ago
clinicalml / realhumaneval
☆21Updated 8 months ago
gautierdag / plancraft
Plancraft is a minecraft environment and agent suite to test planning capabilities in LLMs
☆15Updated last week
facebookresearch / adaptive_scheduling
Experimental scripts for researching data adaptive learning rate scheduling.
☆23Updated last year
kyegomez / MM1
PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"
☆24Updated 2 weeks ago
facebookresearch / dualformer
implementation of dualformer
☆18Updated 4 months ago
ArmelRandy / tree-of-problems
[EMNLP 2024] Tree of Problems: Improving structured problem solving with compositionality
☆19Updated 4 months ago
allenai / sso
Repository for Skill Set Optimization
☆14Updated 11 months ago
sher222 / LeReT
Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
☆39Updated 8 months ago
facebookarchive / uefisettings
The tool to read/get/extract and write/change/modify BIOS/UEFI settings from Linux terminal.
☆6Updated last year
IBM / NL2PDDL
this is for fun, ain't it grand!
☆20Updated 2 months ago
dinobby / MAGDi
The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…
☆35Updated last year
DCDmllm / HyperLLaVA
Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
☆28Updated last year
NJUNLP / PATS
☆44Updated last month
kyegomez / MobileVLM
Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …
☆16Updated last year
Improbable-AI / orso
☆13Updated 4 months ago
princeton-nlp / InstructEval
[NAACL 2024 Findings] Evaluation suite for the systematic evaluation of instruction selection methods.
☆22Updated last year
XiaojuanTang / Mars
a benchmark to evaluate the situated inductive reasoning
☆16Updated 6 months ago
Shalev-Lifshitz / MultiAgentVerification
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
☆19Updated 4 months ago
tanyuqian / cappy
NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer
☆43Updated last year
zaydzuhri / flame
Fork of Flame repo for training of some new stuff in development
☆14Updated last week
tajwarfahim / paprika
Official Code Release for "Training a Generally Curious Agent"
☆28Updated last month
wangskyGit / passage-sieve
official repo of AAAI2024 paper Mitigating the Impact of False Negatives in Dense Retrieval with Contrastive Confidence Regularization
☆13Updated last year