zorse-project / COBOLEvalLinks
Evaluate LLM-generated COBOL
☆41Updated last year
Alternatives and similar repositories for COBOLEval
Users that are interested in COBOLEval are comparing it to the libraries listed below
Sorting:
- AskIt: Unified programming interface for programming with LLMs (GPT-3.5, GPT-4, Gemini, Claude, Cohere, Llama 2)☆79Updated 11 months ago
- ReLM is a Regular Expression engine for Language Models☆107Updated 2 years ago
- Language Model for Mainframe Modernization☆62Updated last year
- Leverage your LangChain trace data for fine tuning☆46Updated last year
- ☆80Updated last year
- Public repository containing METR's DVC pipeline for eval data analysis☆143Updated 8 months ago
- ☆124Updated last year
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆76Updated last year
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆66Updated 2 years ago
- ☆21Updated last year
- Automatic Prompt Optimization☆48Updated last year
- [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …☆43Updated 2 years ago
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆81Updated 11 months ago
- Flow Chart Image-to-Code Generation☆35Updated 2 years ago
- The Granite Guardian models are designed to detect risks in prompts and responses.☆122Updated 2 months ago
- Query language for blending SQL and LLMs across structured + unstructured data, with type constraints.☆121Updated this week
- XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.☆173Updated 7 months ago
- ☆104Updated 4 months ago
- Advanced Reasoning Benchmark Dataset for LLMs☆47Updated 2 years ago
- [FORGE 2025] Graph-based method for end-to-end code completion with context awareness on repository☆68Updated last year
- ☆93Updated 2 years ago
- Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)☆81Updated 9 months ago
- ☆67Updated 5 months ago
- SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?☆223Updated 3 weeks ago
- ☆266Updated 5 months ago
- Official Repo for CRMArena and CRMArena-Pro☆126Updated last month
- Dataset Viber is your chill repo for data collection, annotation and vibe checks.☆45Updated last year
- ☆12Updated 2 months ago
- ☆40Updated 2 years ago
- ☆80Updated last year