bjoernpl / lm-evaluation-harness-deLinks

A framework for few-shot evaluation of autoregressive language models.

☆13

Alternatives and similar repositories for lm-evaluation-harness-de

Users that are interested in lm-evaluation-harness-de are comparing it to the libraries listed below

Sorting:

huggingface / data-is-better-together
Let's build better datasets, together!
☆260Updated 6 months ago
cognitivecomputations / spectrum
☆128Updated 3 months ago
huggingface / llm-swarm
Manage scalable open LLM inference endpoints in Slurm clusters
☆265Updated last year
bjoernpl / GermanBenchmark
A repository containing the code for translating popular LLM benchmarks to German.
☆26Updated last year
davanstrien / awesome-synthetic-datasets
awesome synthetic (text) datasets
☆289Updated last week
arcee-ai / EvolKit
EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…
☆229Updated 8 months ago
IBM / unitxt
🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data …
☆206Updated this week
arcee-ai / DALM
Domain Adapted Language Modeling Toolkit - E2E RAG
☆324Updated 8 months ago
Locutusque / TPU-Alignment
Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free
☆232Updated 8 months ago
center-for-humans-and-machines / transformer-heads
Toolkit for attaching, training, saving and loading of new heads for transformer models
☆282Updated 4 months ago
QuixiAI / laserRMT
This is our own implementation of 'Layer Selective Rank Reduction'
☆239Updated last year
jondurbin / bagel
A bagel, with everything.
☆322Updated last year
arcee-ai / PruneMe
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models
☆240Updated last year
Leeroo-AI / mergoo
A library for easily merging multiple LLM experts, and efficiently train the merged LLM.
☆485Updated 10 months ago
mlabonne / llm-autoeval
Automatically evaluate your LLMs in Google Colab
☆649Updated last year
embeddings-benchmark / arena
Code for the MTEB Arena
☆21Updated 2 weeks ago
mixedbread-ai / batched
The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…
☆139Updated this week
writer / writing-in-the-margins
☆118Updated 10 months ago
lamini-ai / Lamini-Memory-Tuning
Banishing LLM Hallucinations Requires Rethinking Generalization
☆276Updated last year
Preemo-Inc / text-generation-inference
☆199Updated last year
lightonai / pylate
Late Interaction Models Training & Retrieval
☆481Updated last week
FastEval / FastEval
Fast & more realistic evaluation of chat language models. Includes leaderboard.
☆187Updated last year
jxmorris12 / cde
code for training & evaluating Contextual Document Embedding models
☆194Updated 2 months ago
booydar / babilong
BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.
☆206Updated 2 months ago
Pints-AI / 1.5-Pints
A compact LLM pretrained in 9 days by using high quality data
☆318Updated 3 months ago
tcapelle / llm_recipes
A set of scripts and notebooks on LLM finetunning and dataset creation
☆110Updated 9 months ago
daniel-furman / sft-demos
Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.
☆77Updated 8 months ago
JinjieNi / MixEval
The official evaluation suite and dynamic data release for MixEval.
☆242Updated 8 months ago
sileod / tasksource
Datasets collection and preprocessings framework for NLP extreme multitask learning
☆184Updated last week
apple / ml-superposition-prompting
☆145Updated 11 months ago