NLP-Core-Team / RealCode_evalLinks
☆11Updated 4 months ago
Alternatives and similar repositories for RealCode_eval
Users that are interested in RealCode_eval are comparing it to the libraries listed below
Sorting:
- [FORGE 2025] Graph-based method for end-to-end code completion with context awareness on repository☆64Updated 10 months ago
- First-of-its-kind AI benchmark for evaluating the protection capabilities of large language model (LLM) guard systems (guardrails and saf…☆39Updated last month
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated 8 months ago
- Evalica, your favourite evaluation toolkit☆38Updated 3 weeks ago
- Binding to transformers in ggml☆63Updated this week
- 🔍 Code Search Tools & Experiments☆12Updated 2 weeks ago
- A library for LLM-based agents to navigate large codebases efficiently.☆16Updated 8 months ago
- Pivotal Token Search☆109Updated last week
- A Python framework for building AI agent systems with robust task management in the form of a graph execution engine, inference capabilit…☆25Updated last month
- A simple github actions script to build a llamafile and uploads to huggingface☆15Updated last year
- distill chatGPT coding ability into small model (1b)☆30Updated last year
- Two Automatic code completion IDE extensions for @JetBrains and @microsoft/vscode based on Transformer-based large language models for so…☆55Updated last year
- Harness used to benchmark aider against SWE Bench benchmarks☆72Updated last year
- Geniusrise: Framework for building geniuses☆60Updated last year
- A novel approach for transformer model introspection that enables saving, compressing, and manipulating internal thought states for advan…☆22Updated 3 months ago
- Nexusflow function call, tool use, and agent benchmarks.☆25Updated 7 months ago
- ☆31Updated 9 months ago
- A TLA+ AutoRepair System For Formal Specification with GPT-4☆14Updated last year
- LLM application tracing based on OpenTelemetry☆13Updated this week
- A text-to-SQL prototype on the northwind sqlite dataset☆13Updated 9 months ago
- DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.☆178Updated 2 months ago
- Automatically pass your funcions defined in Python to ChatGPT have it call them back seemlessly.☆13Updated 2 years ago
- CodeMind is a generic framework for evaluating inductive code reasoning of LLMs. It is equipped with a static analysis component that ena…☆39Updated 3 months ago
- ☆28Updated 10 months ago
- Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions☆44Updated 11 months ago
- Trying to deconstruct RWKV in understandable terms☆14Updated 2 years ago
- A tool for an analysis of LLM generations.☆40Updated last month
- Interview-based evaluation of LLMs☆20Updated 6 months ago
- CodeSage: Code Representation Learning At Scale (ICLR 2024)☆109Updated 8 months ago
- ☆44Updated last year