HammingHQ / bug-in-the-code-stack
A new benchmark for measuring LLM's capability to detect bugs in large codebase.
☆29Updated 8 months ago
Alternatives and similar repositories for bug-in-the-code-stack:
Users that are interested in bug-in-the-code-stack are comparing it to the libraries listed below
- Just a bunch of benchmark logs for different LLMs☆119Updated 6 months ago
- Chat Markup Language conversation library☆55Updated last year
- look how they massacred my boy☆63Updated 4 months ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆89Updated 3 weeks ago
- ☆48Updated last year
- Routing on Random Forest (RoRF)☆114Updated 4 months ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆63Updated 3 months ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆71Updated last month
- Simple examples using Argilla tools to build AI☆53Updated 3 months ago
- Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through r…☆58Updated 7 months ago
- Testing paligemma2 finetuning on reasoning dataset☆18Updated last month
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆64Updated 3 months ago
- Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)☆76Updated last week
- ☆111Updated 2 months ago
- ☆51Updated 3 months ago
- ☆86Updated 4 months ago
- ☆48Updated 3 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆100Updated 10 months ago
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆21Updated last month
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 7 months ago
- Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.☆85Updated 2 weeks ago
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…☆59Updated 3 months ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆118Updated last year
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆222Updated this week
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆130Updated this week
- ☆20Updated last year
- An example implementation of RLHF (or, more accurately, RLAIF) built on MLX and HuggingFace.☆24Updated 7 months ago