HammingHQ / bug-in-the-code-stack
A new benchmark for measuring LLM's capability to detect bugs in large codebase.
☆29Updated 7 months ago
Alternatives and similar repositories for bug-in-the-code-stack:
Users that are interested in bug-in-the-code-stack are comparing it to the libraries listed below
- Just a bunch of benchmark logs for different LLMs☆116Updated 5 months ago
- Routing on Random Forest (RoRF)☆98Updated 3 months ago
- Official homepage for "Self-Harmonized Chain of Thought"☆88Updated last month
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆68Updated 3 weeks ago
- Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through r…☆58Updated 6 months ago
- Synthetic Data for LLM Fine-Tuning☆107Updated last year
- Embed anything.☆28Updated 7 months ago
- Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)☆73Updated 4 months ago
- ☆107Updated 3 weeks ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆60Updated 2 months ago
- RAG example using DSPy, Gradio, FastAPI☆70Updated 9 months ago
- Prompt design in Python☆49Updated last month
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆62Updated 2 months ago
- ☆85Updated 3 months ago
- ☆48Updated last year
- An automated tool for discovering insights from research papaer corpora☆135Updated 7 months ago
- Simple Graph Memory for AI applications☆81Updated 5 months ago
- A seamless matchmaking application that is programmed with Cohere Command R+, Stanford NLP DSPy framework, Weaviate Vector store and Crew…☆59Updated 8 months ago
- Testing speed and accuracy of RAG with, and without Cross Encoder Reranker.☆48Updated last year
- Chat Markup Language conversation library☆55Updated last year
- A Python library to orchestrate LLMs in a neural network-inspired structure☆46Updated 3 months ago
- Track the progress of LLM context utilisation☆53Updated 6 months ago
- A framework for orchestrating AI agents using a mermaid graph☆75Updated 8 months ago
- ☆88Updated last year
- look how they massacred my boy☆63Updated 3 months ago
- ☆30Updated 6 months ago
- ☆65Updated 7 months ago
- 🦾💻🌐 distributed training & serverless inference at scale on RunPod☆17Updated 7 months ago
- Function Calling Benchmark & Testing☆78Updated 6 months ago
- ☆135Updated last month