IBM / data-prep-kit

Open source project for data preparation of LLM application builders

☆300

Related projects ⓘ

Alternatives and complementary repositories for data-prep-kit

ibm-granite-community / granite-snack-cookbook
Granite Snack Cookbook -- easily consumable recipes (python notebooks) that showcase the capabilities of the Granite models
☆64Updated last week
IBM / unitxt
🦄 Unitxt: a python library for getting data fired up and set for training and evaluation
☆160Updated this week
ibm-granite / granite-3.0-language-models
☆214Updated last week
IBM / prompt-declaration-language
Prompt Declaration Language (PDL) is a declarative prompt programming language.
☆74Updated this week
davanstrien / awesome-synthetic-datasets
awesome synthetic (text) datasets
☆242Updated 3 weeks ago
aishwaryaprabhat / goku
GenAIOps on Kubernetes: A collection of reference architectures for running GenAI at scale on Kubernetes using OSS tooling
☆128Updated 3 weeks ago
ibm-ecosystem-engineering / SuperKnowa
Build Enterprise RAG (Retriver Augmented Generation) Pipelines to tackle various Generative AI use cases with LLM's by simply plugging co…
☆110Updated 3 months ago
CYQIQ / MultiCoT
Repository to demonstrate Chain of Table reasoning with multiple tables powered by LangGraph
☆146Updated 7 months ago
predlico / ARAGOG
ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…
☆96Updated 7 months ago
brandonstarxel / chunking_evaluation
This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation.…
☆160Updated last month
huggingface / data-is-better-together
Let's build better datasets, together!
☆206Updated this week
anyscale / llm-router
Tutorial for building LLM router
☆163Updated 4 months ago
apple / ml-superposition-prompting
☆131Updated 4 months ago
KarelDO / xmc.dspy
In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.
☆386Updated 9 months ago
diicellman / dspy-rag-fastapi
FastAPI wrapper around DSPy
☆214Updated 8 months ago
neuralmagic / guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
☆165Updated 2 weeks ago
zetaalphavector / RAGElo
RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker
☆106Updated 3 weeks ago
StacklokLabs / promptwright
Generate large synthetic data using a local LLM
☆212Updated last week
jina-ai / late-chunking
Code for explaining and evaluating late chunking (chunked pooling)
☆248Updated last month
foundation-model-stack / fms-hf-tuning
🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.
☆27Updated this week
IntelLabs / RAG-FiT
Framework for enhancing LLMs for RAG tasks using fine-tuning.
☆505Updated this week
chrisammon3000 / dspy-neo4j-knowledge-graph
LLM-driven automated knowledge graph construction from text using DSPy and Neo4j.
☆154Updated 7 months ago
AymenKallala / RAG_Maestro
Building a chatbot powered with a RAG pipeline to read,summarize and quote the most relevant papers related to the user query.
☆162Updated 6 months ago
stephenleo / llm-structured-output-benchmarks
Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…
☆133Updated last month
Arize-ai / openinference
OpenTelemetry Instrumentation for AI Observability
☆220Updated this week
IBM / text-generation-inference
IBM development fork of https://github.com/huggingface/text-generation-inference
☆57Updated last month
aurelio-labs / semantic-chunkers
☆182Updated this week
whyhow-ai / whyhow
Automated knowledge graph creation SDK
☆113Updated 4 months ago
instructlab / taxonomy
Taxonomy tree that will allow you to create models tuned with your data
☆200Updated this week
premAI-io / benchmarks
🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
☆134Updated 3 months ago