zeno-ml / zeno-evals

Visualize OpenAI Evals with Zeno

☆24

Alternatives and similar repositories for zeno-evals:

Users that are interested in zeno-evals are comparing it to the libraries listed below

CarperAI / squeakily
A library for squeakily cleaning and filtering language datasets.
☆46Updated last year
taylorai / onnx_embedding_models
utilities for loading and running text embeddings with onnx
☆44Updated 6 months ago
1rgs / tokenwiz
A clone of OpenAI's Tokenizer page for HuggingFace Models
☆44Updated last year
egozverev / Should-It-Be-Executed-Or-Processed
Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.
☆46Updated 8 months ago
cfahlgren1 / hf-data-explorer
Chrome Extension for exploring Hugging Face datasets 🔎
☆49Updated 5 months ago
virevolai / logos-shift-client
Replace expensive LLM calls with finetunes automatically
☆62Updated last year
NousResearch / StripedHyenaTrainer
☆60Updated last year
enjalot / fineweb-modal
Using modal.com to process FineWeb-edu data
☆20Updated 2 months ago
CarperAI / treasure_trove
☆22Updated last year
pacman100 / peft-codegen-25
☆24Updated last year
S1M0N38 / dspy-arxiv
Explore the use of DSPy for extracting features from PDFs 🔎
☆38Updated 11 months ago
awslabs / extending-the-context-length-of-open-source-llms
☆51Updated 2 months ago
simonw / llm-anyscale-endpoints
LLM plugin for models hosted by Anyscale Endpoints
☆32Updated 9 months ago
shoggoth13 / agents-deconstructed
☆57Updated last year
cg123 / rathe
Tools for formatting large language model prompts.
☆12Updated last year
Muhtasham / summarization-eval
📝 Reference-Free automatic summarization evaluation with potential hallucination detection
☆101Updated last year
Arize-ai / LLMTest_NeedleInAHaystack
Doing simple retrieval from LLM models at various context lengths to measure accuracy
☆100Updated 10 months ago
Alignment-Lab-AI / Our-Projects
A repository of projects and datasets under active development by Alignment Lab AI
☆22Updated last year
weaviate-tutorials / Hurricane
Writing Blog Posts with Generative Feedback Loops!
☆47Updated 10 months ago
aymeric-roucher / LongContext_vs_RAG_NeedleInAHaystack
Comparing retrieval abilities from GPT4-Turbo and a RAG system on a toy example for various context lengths
☆35Updated last year
Hannibal046 / nanoColBERT
Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).
☆79Updated 11 months ago
deep-diver / LLM-Pref-Mark-UI
☆37Updated last year
Watchfulio / dataset-generator
A new way to generate large quantities of high quality synthetic data (on par with GPT-4), with better controllability, at a fraction of …
☆22Updated 4 months ago
zeno-ml / zeno-hub
AI Evaluation Platform
☆46Updated this week
cloneofsimo / fim-llama-deepspeed
☆31Updated last year
Columbia-NLP-Lab / PAPILLON
Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles
☆21Updated last month
thomasnormal / fewshot
☆26Updated 4 months ago
Alignment-Lab-AI / datagen
a pipeline for using api calls to agnostically convert unstructured data into structured training data
☆29Updated 4 months ago
AblateIt / finetune-study
Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.
☆82Updated last year
FL33TW00D / embd
GPU accelerated client-side embeddings for vector search, RAG etc.
☆65Updated last year