huggingface/data-is-better-together

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/huggingface/data-is-better-together)

huggingface / data-is-better-together

Let's build better datasets, together!

☆274

Alternatives and similar repositories for data-is-better-together

Users that are interested in data-is-better-together are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

davanstrien / awesome-synthetic-datasets
View on GitHub
awesome synthetic (text) datasets
☆335Jan 8, 2026Updated 6 months ago
argilla-io / distilabel-spin-dibt
View on GitHub
Repository containing the SPIN experiments on the DIBT 10k ranked prompts
☆24Mar 12, 2024Updated 2 years ago
lhoestq / hfjobs
View on GitHub
Hugging Face Jobs
☆20Jul 11, 2025Updated last year
argilla-io / distilabel
View on GitHub
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…
☆3,344Updated this week
davanstrien / huggingface-tldr
View on GitHub
Experimental tl;dr summaries for datasets on the Hugging Face Hub!
☆10Apr 4, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
shachardon / naturally_occurring_feedback
View on GitHub
☆14Dec 1, 2025Updated 7 months ago
shachardon / share-lm
View on GitHub
ShareLM is a Chrome extension that lets you share your open-source conversations
☆15Jun 15, 2026Updated last month
davidberenstein1957 / dataset-viber
View on GitHub
Dataset Viber is your chill repo for data collection, annotation and vibe checks.
☆47Sep 5, 2024Updated last year
davanstrien / haiku-dpo
View on GitHub
Using open source LLMs to build synthetic datasets for direct preference optimization
☆72Feb 29, 2024Updated 2 years ago
huggingface / dataset-dedupe-estimator
View on GitHub
parquet dedupe estimator
☆27May 26, 2026Updated last month
huggingface / datatrove
View on GitHub
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆3,220Updated this week
alvarobartt / bpe.zig
View on GitHub
Minimal implementation of a Byte Pair Encoding (BPE) tokenizer in Zig
☆15Apr 7, 2025Updated last year
argilla-io / synthetic-data-generator
View on GitHub
Build datasets using natural language
☆587Sep 19, 2025Updated 10 months ago
huggingface / lighteval
View on GitHub
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
☆2,495Jun 29, 2026Updated 3 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
severian42 / Vodalus-Expert-LLM-Forge
View on GitHub
Dataset Crafting w/ RAG/Wikipedia ground truth and Efficient Fine-Tuning Using MLX and Unsloth. Includes configurable dataset annotation …
☆196Jul 21, 2024Updated 2 years ago
MoritzLaurer / prompt_templates
View on GitHub
A library for working with prompt templates locally or on the Hugging Face Hub.
☆56Mar 5, 2025Updated last year
wangitu / Ada-Instruct
View on GitHub
☆17Apr 10, 2024Updated 2 years ago
argilla-io / notus
View on GitHub
Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first app…
☆168Jan 15, 2024Updated 2 years ago
interstellarninja / function-calling-eval
View on GitHub
A framework for evaluating function calls made by LLMs
☆41Jul 23, 2024Updated 2 years ago
gradio-app / nbgradio
View on GitHub
nbgradio converts Jupyter notebooks with gradio code into static websites with live gradio apps!
☆17Oct 15, 2025Updated 9 months ago
huggingface / doc-builder
View on GitHub
The package used to build the documentation of our Hugging Face repos
☆140Jul 13, 2026Updated last week
huggingface / nanotron
View on GitHub
Minimalistic large language model 3D-parallelism training
☆2,764May 26, 2026Updated last month
mlabonne / llm-autoeval
View on GitHub
Automatically evaluate your LLMs in Google Colab
☆695May 7, 2024Updated 2 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
arcee-ai / mergekit
View on GitHub
Tools for merging pretrained large language models.
☆7,260Jun 17, 2026Updated last month
QuixiAI / laserRMT
View on GitHub
This is our own implementation of 'Layer Selective Rank Reduction'
☆240May 26, 2024Updated 2 years ago
cfahlgren1 / observers
View on GitHub
A Lightweight Library for AI Observability
☆255Feb 20, 2025Updated last year
tomaarsen / attention_sinks
View on GitHub
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
☆735Apr 10, 2024Updated 2 years ago
cfahlgren1 / hf-data-explorer
View on GitHub
Chrome Extension for exploring Hugging Face datasets 🔎
☆48Sep 18, 2024Updated last year
huggingface / huggingface-inference-toolkit
View on GitHub
Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.
☆95Updated this week
huggingface / cosmopedia
View on GitHub
☆572Nov 20, 2024Updated last year
5uru / Median
View on GitHub
Median is an open-source flashcard application that leverages the power of spaced repetition and artificial intelligence to transform the…
☆21Nov 4, 2024Updated last year
prometheus-eval / prometheus-eval
View on GitHub
Evaluate your LLM's response with Prometheus and GPT4 💯
☆1,102Apr 25, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
huggingface / setfit
View on GitHub
Efficient few-shot learning with Sentence Transformers
☆2,777May 26, 2026Updated last month
huggingface / alignment-handbook
View on GitHub
Robust recipes to align language models with human and AI preferences
☆5,643May 26, 2026Updated last month
argilla-io / argilla
View on GitHub
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
☆5,048Updated this week
QuixiAI / extract-expert
View on GitHub
Extract a single expert from a Mixture Of Experts model using slerp interpolation.
☆19May 26, 2024Updated 2 years ago
jjallaire / inspect-llm-workshop
View on GitHub
☆57May 28, 2024Updated 2 years ago
Tomiinek / Aargh
View on GitHub
☆12Jan 2, 2024Updated 2 years ago
huggingface / llm-swarm
View on GitHub
Manage scalable open LLM inference endpoints in Slurm clusters
☆289Jul 11, 2024Updated 2 years ago