BatsResearch/bonito

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/BatsResearch/bonito)

BatsResearch / bonito

A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.

☆831

Alternatives and similar repositories for bonito

Users that are interested in bonito are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

BatsResearch / nayak-aclfindings24-code
View on GitHub
☆22Jul 16, 2024Updated 2 years ago
argilla-io / distilabel
View on GitHub
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…
☆3,346Updated this week
BatsResearch / alfred
View on GitHub
A system for prompted weak supervision. Alfred is a powerful tool that leverages large language models to accelerate data annotation.
☆58Apr 3, 2025Updated last year
huggingface / text-clustering
View on GitHub
Easily embed, cluster and semantically label text datasets
☆609Mar 28, 2024Updated 2 years ago
magpie-align / magpie
View on GitHub
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …
☆875Mar 17, 2025Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
arcee-ai / mergekit
View on GitHub
Tools for merging pretrained large language models.
☆7,266Jun 17, 2026Updated last month
Pleias / Various-Finetuning
View on GitHub
Set of scripts to finetune LLMs
☆38Mar 30, 2024Updated 2 years ago
databricks / lilac
View on GitHub
Curate better data for LLMs
☆1,072Mar 19, 2024Updated 2 years ago
axolotl-ai-cloud / axolotl
View on GitHub
Go ahead and axolotl questions
☆12,275Updated this week
AnswerDotAI / RAGatouille
View on GitHub
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…
☆3,944May 17, 2025Updated last year
BatsResearch / LexC-Gen
View on GitHub
Generate synthetic labeled data for extremely low-resource languages using bilingual lexicons.
☆20Oct 3, 2024Updated last year
huggingface / cosmopedia
View on GitHub
☆573Nov 20, 2024Updated last year
prometheus-eval / prometheus-eval
View on GitHub
Evaluate your LLM's response with Prometheus and GPT4 💯
☆1,102Apr 25, 2025Updated last year
stanfordnlp / pyreft
View on GitHub
Stanford NLP Python library for Representation Finetuning (ReFT)
☆1,574Mar 5, 2026Updated 4 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
huggingface / datatrove
View on GitHub
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆3,237Jul 22, 2026Updated last week
e-p-armstrong / augmentoolkit
View on GitHub
Create Custom LLMs
☆1,859Jun 27, 2026Updated last month
predibase / lorax
View on GitHub
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
☆3,821May 28, 2026Updated 2 months ago
datadreamer-dev / DataDreamer
View on GitHub
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
☆1,115Feb 2, 2025Updated last year
AnswerDotAI / rerankers
View on GitHub
A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.
☆1,625Dec 20, 2025Updated 7 months ago
VikParuchuri / textbook_quality
View on GitHub
Generate textbook-quality synthetic LLM pretraining data
☆508Oct 19, 2023Updated 2 years ago
dottxt-ai / outlines
View on GitHub
Structured Outputs
☆15,419Updated this week
huggingface / alignment-handbook
View on GitHub
Robust recipes to align language models with human and AI preferences
☆5,651May 26, 2026Updated 2 months ago
uclaml / SPIN
View on GitHub
The official implementation of Self-Play Fine-Tuning (SPIN)
☆1,248May 8, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
mlabonne / llm-datasets
View on GitHub
Curated list of datasets and tools for post-training.
☆4,712Apr 29, 2026Updated 3 months ago
SebastianBodza / EnsembleForecasting
View on GitHub
Using multiple LLMs for ensemble Forecasting
☆16Jan 17, 2024Updated 2 years ago
MeetKai / functionary
View on GitHub
Chat language model that can use tools and interpret the results
☆1,595Jun 30, 2026Updated 3 weeks ago
TencentARC / LLaMA-Pro
View on GitHub
[ACL 2024] Progressive LLaMA with Block Expansion.
☆513May 20, 2024Updated 2 years ago
clinicalml / co-llm
View on GitHub
Co-LLM: Learning to Decode Collaboratively with Multiple Language Models
☆128May 7, 2024Updated 2 years ago
AnswerDotAI / fsdp_qlora
View on GitHub
Training LLMs with QLoRA + FSDP
☆1,550Nov 9, 2024Updated last year
stanfordnlp / dspy
View on GitHub
DSPy: The framework for programming—not prompting—language models
☆36,434Updated this week
fanqiwan / FuseAI
View on GitHub
FuseAI Project
☆600Jan 25, 2025Updated last year
Lightning-AI / litgpt
View on GitHub
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
☆13,594Jul 20, 2026Updated last week
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
tanaymeh / mamba-train
View on GitHub
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
☆63Apr 8, 2024Updated 2 years ago
neuml / txtinstruct
View on GitHub
📚 Datasets and models for instruction-tuning
☆238Sep 23, 2023Updated 2 years ago
yizhongw / self-instruct
View on GitHub
Aligning pretrained language models with instruction data generated by themselves.
☆4,607Mar 27, 2023Updated 3 years ago
alvin-r / databonsai
View on GitHub
clean & curate your data with LLMs.
☆487Jun 26, 2024Updated 2 years ago
sanyalsunny111 / LLM-Inheritune
View on GitHub
[TMLR 2025] When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models
☆126Mar 6, 2026Updated 4 months ago
lavague-ai / LaVague
View on GitHub
Large Action Model framework to develop AI Web Agents
☆6,387Jan 21, 2025Updated last year
567-labs / instructor
View on GitHub
structured outputs for llms
☆13,642Jul 13, 2026Updated 2 weeks ago