definitive-io / human-eval-sampling-benchmark
OpenAI's human-eval sampling benchmark
☆13Updated last year
Alternatives and similar repositories for human-eval-sampling-benchmark:
Users that are interested in human-eval-sampling-benchmark are comparing it to the libraries listed below
- Code Indexer Loop is a Python library for indexing and retrieving source code files through an integrated vector database that's continuo…☆175Updated last year
- auto fine tune of models with synthetic data☆75Updated last year
- Demo of AI chatbot that predicts user message to generate response quickly.☆102Updated last year
- Turn any developer documentation into a GPT☆92Updated 2 months ago
- ☆87Updated 2 months ago
- Annoucing Instructor Cloud☆36Updated 8 months ago
- Build robust, production grade function calling assistants that work. Declarative and extensible. Built on top of LangChain ⚡️☆77Updated 11 months ago
- A couple scripts to grab stats from email☆42Updated 8 months ago
- A spotify playlist agent using CrewAI☆81Updated 11 months ago
- Turn a Github Repo's contents into a big prompt for long-context models like Claude 3 Opus.☆208Updated 2 months ago
- ☆171Updated 8 months ago
- Useful resources for LLM-based Diarization and Transcription.☆55Updated 6 months ago
- Fluid Database☆114Updated 7 months ago
- A fork of OpenAI Swarm that supports Groq and Anthropic☆120Updated 2 months ago
- 🌸 The open framework for question answering fine-tuning LLMs on private data☆69Updated last year
- ☆29Updated 5 months ago
- An assistant for Slack built with Arcade and Langgraph. Interact with Google Calendar, Mail, Github, Search Engines, Firecrawl and more a…☆76Updated last month
- MCP server that enhances Claude's reasoning capabilities by integrating DeepSeek R1's advanced reasoning engine 🤔☆49Updated 3 months ago
- Your automated SWE fleet to get your tickets from the Backlog to Prod!☆96Updated last year
- MarinaBox is a toolkit for creating and managing secure, isolated environments for AI agents☆124Updated 2 months ago
- ☆68Updated 7 months ago
- ☆47Updated last year
- OpenAI's Realtime API minus the enterprise bloat☆45Updated 5 months ago
- ☆45Updated 10 months ago
- Letting Claude Code develop his own MCP tools :)☆99Updated 2 months ago
- converts url content into JSON with a simple prefix☆68Updated last year
- ⛓️ build cognitive systems, pythonic☆336Updated 5 months ago
- A seamless matchmaking application that is programmed with Cohere Command R+, Stanford NLP DSPy framework, Weaviate Vector store and Crew…☆59Updated last year
- ☆4Updated 8 months ago
- they've simulated websites, worlds, and imaginary CLIs... but what if they simulated *you*?☆120Updated last week