langwatch / langevals
LangEvals aggregates various language model evaluators into a single platform, providing a standard interface for a multitude of scores and LLM guardrails, for you to protect and benchmark your LLM models and pipelines.
☆41Updated this week
Related projects ⓘ
Alternatives and complementary repositories for langevals
- Data Questionnaire Agent Chatbot☆61Updated this week
- A Ruby on Rails style framework for the DSPy (Demonstrate, Search, Predict) project for Language Models like GPT, BERT, and LLama.☆112Updated last month
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆65Updated this week
- Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through r…☆57Updated 4 months ago
- Writing Blog Posts with Generative Feedback Loops!☆43Updated 8 months ago
- ☆75Updated 10 months ago
- Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)☆74Updated 2 months ago
- Dynamic Metadata based RAG Framework☆71Updated 3 months ago
- A Python package to dynamically load functions for OpenAI Assistant☆55Updated 11 months ago
- auto fine tune of models with synthetic data☆72Updated 9 months ago
- Framework for building, orchestrating and deploying multi-agent systems. Managed by OpenAI Solutions team. Experimental framework.☆78Updated last month
- RAG example using DSPy, Gradio, FastAPI☆66Updated 7 months ago
- A seamless matchmaking application that is programmed with Cohere Command R+, Stanford NLP DSPy framework, Weaviate Vector store and Crew…☆58Updated 7 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 4 months ago
- Natural Language Interfaces Powered by LLMs☆91Updated 3 months ago
- High level tool use for LLMs☆34Updated 3 months ago
- ☆87Updated last year
- Generate Tools and Toolkits from any Python SDK -- no extra code required☆49Updated 2 weeks ago
- ☆57Updated last year
- LangChain chat model abstractions for dynamic failover, load balancing, chaos engineering, and more!☆79Updated 9 months ago
- The next evolution of Agents☆46Updated last week
- Structured outputs from DSPy and Jinja2☆15Updated 3 weeks ago
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…☆53Updated 3 weeks ago
- LLM reads a paper and produce a working prototype☆36Updated 2 weeks ago
- Simple Graph Memory for AI applications☆79Updated 4 months ago
- OpenAI GPT hosted Agent Framework for Windows and MacOS☆36Updated 4 months ago
- Simple examples using Argilla tools to build AI☆42Updated this week
- AI real estate agent☆31Updated 9 months ago
- Code interpreter support for o1☆32Updated 2 months ago
- A framework for evaluating function calls made by LLMs☆35Updated 4 months ago