lilacai / lilac

Curate better data for LLMs

☆934

Related projects: ⓘ

argilla-io / distilabel
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…
☆1,396Updated this week
AnswerDotAI / rerankers
A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.
☆790Updated last week
huggingface / datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆1,935Updated last week
carlini / yet-another-applied-llm-benchmark
A benchmark to evaluate language models on questions I've previously asked them to solve.
☆871Updated this week
srush / MiniChain
A tiny library for coding with large language models.
☆1,205Updated 2 months ago
abacaj / fine-tune-mistral
Fine-tune mistral-7B on 3090s, a100s, h100s
☆701Updated 11 months ago
prometheus-eval / prometheus-eval
Evaluate your LLM's response with Prometheus and GPT4 💯
☆745Updated last week
huggingface / lighteval
LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processin…
☆659Updated this week
datadreamer-dev / DataDreamer
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
☆792Updated last month
tomaarsen / attention_sinks
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
☆657Updated 5 months ago
xhluca / bm25s
Fast lexical search library implementing BM25 in Python using Numpy and Scipy
☆767Updated this week
mlabonne / llm-autoeval
Automatically evaluate your LLMs in Google Colab
☆511Updated 4 months ago
predibase / lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
☆2,080Updated this week
VikParuchuri / textbook_quality
Generate textbook-quality synthetic LLM pretraining data
☆479Updated 11 months ago
taylorai / galactic
data cleaning and curation for unstructured text
☆326Updated last month
arthur-ai / bench
A tool for evaluating LLMs
☆376Updated 4 months ago
IntelLabs / fastRAG
Efficient Retrieval Augmentation and Generation Framework
☆1,255Updated last week
scaleapi / llm-engine
Scale LLM Engine public repository
☆770Updated this week
jquesnelle / yarn
YaRN: Efficient Context Window Extension of Large Language Models
☆1,306Updated 5 months ago
apoorvumang / prompt-lookup-decoding
☆442Updated 3 weeks ago
predibase / llm_distillation_playbook
Best practices for distilling large language models.
☆370Updated 7 months ago
philschmid / easyllm
☆429Updated 8 months ago
illuin-tech / colpali
The code used to train and run inference with the ColPali architecture.
☆502Updated this week
jondurbin / airoboros
Customizable implementation of the self-instruct paper.
☆1,004Updated 6 months ago
jxmorris12 / vec2text
utilities for decoding deep representations (like sentence embeddings) back to text
☆691Updated last week
KarelDO / xmc.dspy
In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.
☆362Updated 7 months ago
r2d4 / rellm
Exact structure out of any language model completion.
☆497Updated last year
AnswerDotAI / RAGatouille
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…
☆2,817Updated 2 weeks ago
gkamradt / LLMTest_NeedleInAHaystack
Doing simple retrieval from LLM models at various context lengths to measure accuracy
☆1,451Updated last month
modal-labs / llm-finetuning
Guide for fine-tuning Llama/Mistral/CodeLlama models and more
☆521Updated 3 weeks ago