argilla-io / synthetic-data-generator
Build datasets using natural language
β440Updated 3 weeks ago
Alternatives and similar repositories for synthetic-data-generator:
Users that are interested in synthetic-data-generator are comparing it to the libraries listed below
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. π¨π»βπ³β264Updated 3 months ago
- An Open Source Toolkit For LLM Distillationβ562Updated 2 months ago
- awesome synthetic (text) datasetsβ265Updated 5 months ago
- Framework for enhancing LLMs for RAG tasks using fine-tuning.β737Updated last month
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.β757Updated 2 months ago
- Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engineβ443Updated 2 months ago
- Banishing LLM Hallucinations Requires Rethinking Generalizationβ273Updated 8 months ago
- Code for explaining and evaluating late chunking (chunked pooling)β358Updated 3 months ago
- Generate large synthetic data using an LLMβ402Updated this week
- A Lightweight Library for AI Observabilityβ238Updated last month
- Structured information extraction from documentsβ312Updated 6 months ago
- Automatically evaluate your LLMs in Google Colabβ613Updated 10 months ago
- Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"β477Updated 2 weeks ago
- Code for Husky, an open-source language agent that solves complex, multi-step reasoning tasks. Husky v1 addresses numerical, tabular and β¦β340Updated 9 months ago
- π Automatically annotate papers using LLMsβ310Updated 3 months ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language Mβ¦β208Updated 5 months ago
- β215Updated 3 months ago
- Synthetic data curation for post-training and structured data extractionβ1,097Updated last week
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on taskβ¦β160Updated 6 months ago
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard aβ¦β1,105Updated 2 months ago
- This is the official repository for Auto-RAG.β203Updated 2 months ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needsβ236Updated this week
- β255Updated 3 months ago
- Search-o1: Agentic Search-Enhanced Large Reasoning Modelsβ759Updated this week
- A reading list on LLM based Synthetic Data Generation π₯β1,223Updated last month
- Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.β189Updated last week
- Make Llama 3.1 8B talk in Rick Sanchezβs styleβ76Updated 2 months ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,359Updated 2 weeks ago
- β208Updated 9 months ago
- [EMNLP 2024 Demo] TinyAgent: Function Calling at the Edge!β377Updated 6 months ago