argilla-io / synthetic-data-generator
Build datasets using natural language
β469Updated this week
Alternatives and similar repositories for synthetic-data-generator
Users that are interested in synthetic-data-generator are comparing it to the libraries listed below
Sorting:
- π€ Benchmark Large Language Models Reliably On Your Dataβ295Updated this week
- An Open Source Toolkit For LLM Distillationβ596Updated 2 weeks ago
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. π¨π»βπ³β282Updated 3 weeks ago
- Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engineβ453Updated 4 months ago
- Generate large synthetic data using an LLMβ414Updated this week
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language Mβ¦β217Updated 6 months ago
- Tool for generating high quality Synthetic datasetsβ797Updated this week
- Code for explaining and evaluating late chunking (chunked pooling)β384Updated 4 months ago
- [EMNLP 2024 Demo] TinyAgent: Function Calling at the Edge!β397Updated 8 months ago
- β254Updated 5 months ago
- Synthetic data curation for post-training and structured data extractionβ1,324Updated this week
- β138Updated last month
- Framework for enhancing LLMs for RAG tasks using fine-tuning.β736Updated 2 months ago
- β645Updated 2 weeks ago
- Fast Semantic Text Deduplication & Filteringβ662Updated 3 weeks ago
- awesome synthetic (text) datasetsβ281Updated 6 months ago
- A Lightweight Library for AI Observabilityβ243Updated 2 months ago
- Automatically evaluate your LLMs in Google Colabβ622Updated last year
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ1,516Updated last week
- A library for easily merging multiple LLM experts, and efficiently train the merged LLM.β475Updated 8 months ago
- A reading list on LLM based Synthetic Data Generation π₯β1,265Updated 2 months ago
- This is the official repository for Auto-RAG.β208Updated 3 weeks ago
- A flexible, adaptive classification system for dynamic text classificationβ165Updated last week
- A compact LLM pretrained in 9 days by using high quality dataβ313Updated last month
- A comprehensive repository of reasoning tasks for LLMs (and beyond)β439Updated 7 months ago
- CodeScientist: An automated scientific discovery system for code-based experimentsβ248Updated last month
- Banishing LLM Hallucinations Requires Rethinking Generalizationβ274Updated 10 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β2,700Updated this week
- Automatic evals for LLMsβ388Updated this week
- Ranking LLMs on agentic tasksβ132Updated 3 weeks ago