argilla-io / synthetic-data-generatorLinks
Build datasets using natural language
β545Updated last month
Alternatives and similar repositories for synthetic-data-generator
Users that are interested in synthetic-data-generator are comparing it to the libraries listed below
Sorting:
- π€ Benchmark Large Language Models Reliably On Your Dataβ410Updated last month
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. π¨π»βπ³β339Updated 5 months ago
- Automatically evaluate your LLMs in Google Colabβ667Updated last year
- A Lightweight Library for AI Observabilityβ251Updated 8 months ago
- Training Model Behavior in Agentic Systemsβ654Updated this week
- π Automatically annotate papers using LLMsβ360Updated 6 months ago
- An Open Source Toolkit For LLM Distillationβ779Updated 4 months ago
- Banishing LLM Hallucinations Requires Rethinking Generalizationβ275Updated last year
- Code for explaining and evaluating late chunking (chunked pooling)β463Updated 10 months ago
- awesome synthetic (text) datasetsβ305Updated this week
- β688Updated 6 months ago
- β268Updated 4 months ago
- Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engineβ486Updated 3 months ago
- Collection of scripts and notebooks for OpenAI's latest GPT OSS modelsβ477Updated 2 months ago
- [EMNLP 2024 Demo] TinyAgent: Function Calling at the Edge!β456Updated last year
- Simple UI for debugging correlations of text embeddingsβ299Updated 5 months ago
- Tutorial for building LLM routerβ235Updated last year
- An open-source tool for LLM prompt optimization.β703Updated this week
- Fast Semantic Text Deduplication & Filteringβ835Updated 3 weeks ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language Mβ¦β242Updated last year
- Tool for generating high quality Synthetic datasetsβ1,379Updated 2 weeks ago
- β158Updated 7 months ago
- One click templates for inferencing Language Modelsβ218Updated 3 months ago
- Ranking LLMs on agentic tasksβ199Updated 2 months ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on taskβ¦β179Updated last year
- Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.β249Updated 3 months ago
- Solving data for LLMs - Create quality synthetic datasets!β150Updated 9 months ago
- Together Open Deep Researchβ352Updated 7 months ago
- From data to vector database effortlesslyβ85Updated 6 months ago
- An Awesome list of curated DSPy resources.β471Updated last month