cleanlab / cleanlab-studioLinks

Client interface to Cleanlab Studio and the Trustworthy Language Model

☆32

Alternatives and similar repositories for cleanlab-studio

Users that are interested in cleanlab-studio are comparing it to the libraries listed below

Sorting:

louisbrulenaudet / ragoon
High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡
☆66Updated 9 months ago
VectorInstitute / fed-rag
A framework for fine-tuning retrieval-augmented generation (RAG) systems.
☆123Updated 3 weeks ago
zeno-ml / zeno-hub
AI Evaluation Platform
☆46Updated 2 months ago
davidberenstein1957 / dataset-viber
Dataset Viber is your chill repo for data collection, annotation and vibe checks.
☆47Updated 11 months ago
cfahlgren1 / observers
A Lightweight Library for AI Observability
☆250Updated 5 months ago
parea-ai / parea-sdk-py
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
☆78Updated 5 months ago
darshil3011 / AutoMetaRAG
Dynamic Metadata based RAG Framework
☆75Updated last year
fiddler-labs / fiddler-auditor
Fiddler Auditor is a tool to evaluate language models.
☆184Updated last year
BhabhaAI / dataformer
Solving data for LLMs - Create quality synthetic datasets!
☆150Updated 6 months ago
deshwalmahesh / PHUDGE
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…
☆49Updated last year
davanstrien / data-for-fine-tuning-llms
☆79Updated last year
discus-labs / discus
A data-centric AI package for ML/AI. Get the best high-quality data for the best results. Discord: https://discord.gg/t6ADqBKrdZ
☆63Updated last year
kolenaIO / autoarena
Rank LLMs, RAG systems, and prompts using automated head-to-head evaluation
☆105Updated 7 months ago
S1M0N38 / dspy-arxiv
Explore the use of DSPy for extracting features from PDFs 🔎
☆45Updated last year
Xalp / ECHO
Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)
☆91Updated 6 months ago
flowaicom / flow-judge
Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…
☆76Updated 9 months ago
h2oai / enterprise-h2ogpte
Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platform
☆87Updated last month
cohere-ai / DiskVectorIndex
☆211Updated last month
argilla-io / argilla-cookbook
Simple examples using Argilla tools to build AI
☆53Updated 8 months ago
Columbia-NLP-Lab / PAPILLON
Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles
☆53Updated 3 months ago
zhudotexe / redel
ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)
☆83Updated 4 months ago
tg1482 / priomptipy
A python implementation of priompt - a neat way of managing context from diverse sources for LLM applications.
☆112Updated 3 weeks ago
weaviate / structured-rag
Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models
☆111Updated 3 months ago
IBM / unitxt
🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data …
☆206Updated this week
l4b4r4b4b4 / AIDocks
LLM-Training-API: Including Embeddings & ReRankers, mergekit, LaserRMT
☆27Updated last year
jina-ai / correlations
Simple UI for debugging correlations of text embeddings
☆288Updated 2 months ago
topoteretes / awesome-ai-memory
A list of AI memory projects
☆185Updated 6 months ago
QuixiAI / spectrum
☆129Updated 4 months ago
apple / ml-superposition-prompting
☆145Updated last year
rungalileo / hallucination-index
Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.
☆113Updated last week