zorse-project / COBOLEvalLinks
Evaluate LLM-generated COBOL
☆42Updated last year
Alternatives and similar repositories for COBOLEval
Users that are interested in COBOLEval are comparing it to the libraries listed below
Sorting:
- Language Model for Mainframe Modernization☆65Updated last year
- ReLM is a Regular Expression engine for Language Models☆107Updated 2 years ago
- Public repository containing METR's DVC pipeline for eval data analysis☆183Updated last week
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆76Updated last year
- AskIt: Unified programming interface for programming with LLMs (GPT-3.5, GPT-4, Gemini, Claude, Cohere, Llama 2)☆80Updated last year
- Advanced Reasoning Benchmark Dataset for LLMs☆47Updated 2 years ago
- [FORGE 2025] Graph-based method for end-to-end code completion with context awareness on repository☆71Updated last year
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆82Updated last year
- Harness used to benchmark aider against SWE Bench benchmarks☆78Updated last year
- ☆41Updated last year
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆66Updated 2 years ago
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".☆69Updated last year
- Data and evaluation scripts for "CodePlan: Repository-level Coding using LLMs and Planning", FSE 2024☆80Updated last year
- LangCode - Improving alignment and reasoning of large language models (LLMs) with natural language embedded program (NLEP).☆48Updated 2 years ago
- Official Repo for CRMArena and CRMArena-Pro☆129Updated 2 months ago
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆114Updated 9 months ago
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆103Updated 5 months ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆90Updated last month
- The Granite Guardian models are designed to detect risks in prompts and responses.☆127Updated 3 months ago
- ☆44Updated 7 months ago
- ☆75Updated 7 months ago
- This repository contains all the code for collecting large scale amounts of code from GitHub.☆110Updated 2 years ago
- ☆13Updated 3 weeks ago
- Google Deepmind's PromptBreeder for automated prompt engineering implemented in langchain expression language.☆163Updated last year
- ☆38Updated 5 months ago
- Pre-train Static Word Embeddings☆94Updated 4 months ago
- LLM sampling method for enforcing syntax adherence in generated output☆25Updated 2 years ago
- Pre-training code for CrystalCoder 7B LLM☆57Updated last year
- ☆44Updated last year
- ☆23Updated 2 years ago