lilacai / lilac
Curate better data for LLMs
☆934Updated 6 months ago
Related projects: ⓘ
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆1,396Updated this week
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆790Updated last week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆1,935Updated last week
- A benchmark to evaluate language models on questions I've previously asked them to solve.☆871Updated this week
- A tiny library for coding with large language models.☆1,205Updated 2 months ago
- Fine-tune mistral-7B on 3090s, a100s, h100s☆701Updated 11 months ago
- Evaluate your LLM's response with Prometheus and GPT4 💯☆745Updated last week
- LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processin…☆659Updated this week
- DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤☆792Updated last month
- Extend existing LLMs way beyond the original training length with constant memory usage, without retraining☆657Updated 5 months ago
- Fast lexical search library implementing BM25 in Python using Numpy and Scipy☆767Updated this week
- Automatically evaluate your LLMs in Google Colab☆511Updated 4 months ago
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆2,080Updated this week
- Generate textbook-quality synthetic LLM pretraining data☆479Updated 11 months ago
- data cleaning and curation for unstructured text☆326Updated last month
- A tool for evaluating LLMs☆376Updated 4 months ago
- Efficient Retrieval Augmentation and Generation Framework☆1,255Updated last week
- Scale LLM Engine public repository☆770Updated this week
- YaRN: Efficient Context Window Extension of Large Language Models☆1,306Updated 5 months ago
- ☆442Updated 3 weeks ago
- Best practices for distilling large language models.☆370Updated 7 months ago
- ☆429Updated 8 months ago
- The code used to train and run inference with the ColPali architecture.☆502Updated this week
- Customizable implementation of the self-instruct paper.☆1,004Updated 6 months ago
- utilities for decoding deep representations (like sentence embeddings) back to text☆691Updated last week
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆362Updated 7 months ago
- Exact structure out of any language model completion.☆497Updated last year
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆2,817Updated 2 weeks ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆1,451Updated last month
- Guide for fine-tuning Llama/Mistral/CodeLlama models and more☆521Updated 3 weeks ago