IBM / data-prep-kit
Open source project for data preparation of LLM application builders
β264Updated this week
Related projects β
Alternatives and complementary repositories for data-prep-kit
- π¦ Unitxt: a python library for getting data fired up and set for training and evaluationβ159Updated this week
- β196Updated this week
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needsβ160Updated this week
- IBM development fork of https://github.com/huggingface/text-generation-inferenceβ57Updated last month
- Taxonomy tree that will allow you to create models tuned with your dataβ196Updated this week
- Granite Snack Cookbook -- easily consumable recipes (python notebooks) that showcase the capabilities of the Granite modelsβ36Updated this week
- β131Updated 3 months ago
- Build Enterprise RAG (Retriver Augmented Generation) Pipelines to tackle various Generative AI use cases with LLM's by simply plugging coβ¦β108Updated 3 months ago
- Prompt Declaration Language (PDL) is a declarative prompt programming language.β65Updated this week
- Let's build better datasets, together!β202Updated 3 months ago
- awesome synthetic (text) datasetsβ239Updated last week
- This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM.β288Updated 2 months ago
- Build document-native LLM applicationsβ50Updated last month
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo rankerβ105Updated last week
- Python library for Synthetic Data Generationβ20Updated this week
- Automated knowledge graph creation SDKβ109Updated 4 months ago
- Tutorial for building LLM routerβ157Updated 3 months ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on taskβ¦β129Updated last month
- π Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.β26Updated this week
- TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycleβ115Updated this week
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β132Updated 3 months ago
- Model, Code & Data for the EMNLP'23 paper "Making Large Language Models Better Data Creators"β115Updated last year
- This is the reproduction repository for my π€ Hugging Face blog post on synthetic dataβ61Updated 8 months ago
- End-to-End LLM Guideβ97Updated 4 months ago
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paperβ¦β97Updated 6 months ago
- Manage scalable open LLM inference endpoints in Slurm clustersβ237Updated 3 months ago
- codebase release for EMNLP2023 paper publicationβ19Updated 8 months ago
- Granite Code Cookbookβ13Updated this week
- Additional packages (components, document stores and the likes) to extend the capabilities of Haystack version 2.0 and onwardsβ112Updated this week
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard aβ¦β764Updated this week