IBM / data-prep-kit
Open source project for data preparation of LLM application builders
β300Updated this week
Related projects β
Alternatives and complementary repositories for data-prep-kit
- Granite Snack Cookbook -- easily consumable recipes (python notebooks) that showcase the capabilities of the Granite modelsβ64Updated last week
- π¦ Unitxt: a python library for getting data fired up and set for training and evaluationβ160Updated this week
- β214Updated last week
- Prompt Declaration Language (PDL) is a declarative prompt programming language.β74Updated this week
- awesome synthetic (text) datasetsβ242Updated 3 weeks ago
- GenAIOps on Kubernetes: A collection of reference architectures for running GenAI at scale on Kubernetes using OSS toolingβ128Updated 3 weeks ago
- Build Enterprise RAG (Retriver Augmented Generation) Pipelines to tackle various Generative AI use cases with LLM's by simply plugging coβ¦β110Updated 3 months ago
- Repository to demonstrate Chain of Table reasoning with multiple tables powered by LangGraphβ146Updated 7 months ago
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paperβ¦β96Updated 7 months ago
- This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation.β¦β160Updated last month
- Let's build better datasets, together!β206Updated this week
- Tutorial for building LLM routerβ163Updated 4 months ago
- β131Updated 4 months ago
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.β386Updated 9 months ago
- FastAPI wrapper around DSPyβ214Updated 8 months ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needsβ165Updated 2 weeks ago
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo rankerβ106Updated 3 weeks ago
- Generate large synthetic data using a local LLMβ212Updated last week
- Code for explaining and evaluating late chunking (chunked pooling)β248Updated last month
- π Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.β27Updated this week
- Framework for enhancing LLMs for RAG tasks using fine-tuning.β505Updated this week
- LLM-driven automated knowledge graph construction from text using DSPy and Neo4j.β154Updated 7 months ago
- Building a chatbot powered with a RAG pipeline to read,summarize and quote the most relevant papers related to the user query.β162Updated 6 months ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on taskβ¦β133Updated last month
- OpenTelemetry Instrumentation for AI Observabilityβ220Updated this week
- IBM development fork of https://github.com/huggingface/text-generation-inferenceβ57Updated last month
- β182Updated this week
- Automated knowledge graph creation SDKβ113Updated 4 months ago
- Taxonomy tree that will allow you to create models tuned with your dataβ200Updated this week
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β134Updated 3 months ago