mrconter1 / PullRequestBenchmarkLinks
Evaluating LLMs performance in PR reviews as an indicator for their capability in creating PRs.
☆12Updated last year
Alternatives and similar repositories for PullRequestBenchmark
Users that are interested in PullRequestBenchmark are comparing it to the libraries listed below
Sorting:
- ☆35Updated last year
- A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private co…☆297Updated last month
- Stop messing around with finicky sampling parameters and just use DRµGS!☆360Updated last year
- Mistral7B playing DOOM☆139Updated last year
- Interactive Fiction in the Age of AI☆36Updated last week
- "It Runs Doom." "Zork?" "Yes." Wadzilla converts Doom WAD files into ZIL text format suitable for compilation to an Infocom-style game…☆39Updated last year
- Mistral7B playing DOOM☆29Updated last year
- One-Click RAG Implementation, Simple and Portable☆30Updated 4 months ago
- Editor with LLM generation tree exploration☆83Updated 11 months ago
- ☆163Updated 10 months ago
- Implement recursion using English as the programming language and an LLM as the runtime.☆240Updated 2 years ago
- Further (?) development of the old ftape driver for Linux.☆102Updated 2 months ago
- Experimental LLM Inference UX to aid in creative writing☆128Updated last year
- ☆115Updated last year
- Benchmark that evaluates LLMs using 759 NYT Connections puzzles extended with extra trick words☆193Updated this week
- A simple tool to anonymize LLM prompts.☆66Updated last year
- I was missing turboc, so I wanted to recreate and modernise the color scheme☆17Updated last year
- LLM shell and document interogator☆14Updated 2 years ago
- The repository provides code for training the SegmentAnything Model (SAM) for predicting frame polygons in comic books☆56Updated last year
- a curated list of data for reasoning ai☆141Updated last year
- Fork of anarki Arc with changes to the news code to support twostopbits.com☆19Updated 3 weeks ago
- ☆258Updated 11 months ago
- ☆57Updated last year
- Tokenflood is a load testing framework for simulating arbitary loads on instruction-tuned LLMs☆44Updated 3 weeks ago
- A preprint version of our recent research on the capability of frontier AI systems to do self-replication☆59Updated last year
- Arduino-based USB rotary controller for arcade Arkanoid, Tempest, etc.☆76Updated last year
- Keeping my personal experiments separate from the main repo☆69Updated 11 months ago
- Rainbow Net Wi-Fi protocol for connecting old consoles to the Internet.☆89Updated 3 months ago
- this is a TypeScript-based MCP server that implements a simple loom and makes it available for Claude to use.☆21Updated last year
- klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs☆86Updated last year