meta-llama / synthetic-data-kitLinks
Tool for generating high quality Synthetic datasets
☆1,455Updated 2 months ago
Alternatives and similar repositories for synthetic-data-kit
Users that are interested in synthetic-data-kit are comparing it to the libraries listed below
Sorting:
- An open-source tool for LLM prompt optimization.☆738Updated 2 weeks ago
- UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection☆1,094Updated this week
- 🤗 Benchmark Large Language Models Reliably On Your Data☆423Updated 2 weeks ago
- Synthetic data curation for post-training and structured data extraction☆1,594Updated last week
- Build datasets using natural language☆558Updated 3 months ago
- ☆695Updated 8 months ago
- Fast State-of-the-Art Static Embeddings☆1,969Updated 2 weeks ago
- 🎨 NeMo Data Designer: A general library for generating high-quality synthetic data from scratch or based on seed data.☆620Updated this week
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard a…☆2,029Updated last month
- Cache-Augmented Generation: A Simple, Efficient Alternative to RAG☆1,463Updated 7 months ago
- A collection of notebooks/recipes showcasing usecases of open-source models with Together AI.☆1,089Updated this week
- An interface library for RL post training with environments.☆973Updated last week
- Generate High-Quality Synthetics, Train, Measure, and Evaluate in a Single Pipeline☆804Updated this week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆2,233Updated last week
- Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜☆1,826Updated 2 months ago
- A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!☆1,179Updated 11 months ago
- ☆2,138Updated 3 weeks ago
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has st…☆1,413Updated 8 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆3,039Updated 3 weeks ago
- Implementing the 4 agentic patterns from scratch☆1,658Updated 9 months ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆837Updated 11 months ago
- Fast Semantic Text Deduplication & Filtering☆863Updated last week
- Large Concept Models: Language modeling in a sentence representation space☆2,324Updated 11 months ago
- Collection of scripts and notebooks for OpenAI's latest GPT OSS models☆494Updated 4 months ago
- Everything about the SmolLM and SmolVLM family of models☆3,539Updated last month
- A framework for comprehensive diagnosis and optimization of agents using simulated, realistic synthetic interactions☆1,158Updated last week
- Optimizing inference proxy for LLMs☆3,266Updated 2 weeks ago
- A toolkit to create optimal Production-readyRetrieval Augmented Generation(RAG) setup for your data☆1,524Updated 7 months ago
- Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input☆934Updated 7 months ago
- This repository shares end-to-end notebooks on how to use various Weaviate features and integrations!☆932Updated last month