zorse-project / COBOLEvalLinks
Evaluate LLM-generated COBOL
☆39Updated last year
Alternatives and similar repositories for COBOLEval
Users that are interested in COBOLEval are comparing it to the libraries listed below
Sorting:
- Language Model for Mainframe Modernization☆57Updated 11 months ago
- ☆44Updated last year
- Pre-train Static Word Embeddings☆85Updated 2 months ago
- [FORGE 2025] Graph-based method for end-to-end code completion with context awareness on repository☆64Updated 11 months ago
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆23Updated 4 months ago
- ☆35Updated last month
- CodeSage: Code Representation Learning At Scale (ICLR 2024)☆111Updated 9 months ago
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆111Updated 4 months ago
- XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.☆152Updated 3 months ago
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆75Updated 9 months ago
- Python library to use Pleias-RAG models☆61Updated 3 months ago
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆76Updated 7 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆73Updated 8 months ago
- AskIt: Unified programming interface for programming with LLMs (GPT-3.5, GPT-4, Gemini, Claude, Cohere, Llama 2)☆79Updated 7 months ago
- Pre-training code for CrystalCoder 7B LLM☆55Updated last year
- Example implementation of Iteration of Tought - Gives a star if you like the project☆42Updated 7 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated last year
- ☆20Updated 9 months ago
- Advanced Reasoning Benchmark Dataset for LLMs☆47Updated last year
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆55Updated 6 months ago
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆64Updated 2 years ago
- 🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data …☆206Updated this week
- Source code for the collaborative reasoner research project at Meta FAIR.☆100Updated 3 months ago
- Official Repo for CRMArena and CRMArena-Pro☆104Updated last month
- ☆76Updated this week
- Chat Markup Language conversation library☆55Updated last year
- Writing Blog Posts with Generative Feedback Loops!☆50Updated last year
- ☆96Updated 11 months ago
- Leverage your LangChain trace data for fine tuning☆44Updated last year
- NLP with Rust for Python 🦀🐍☆64Updated 2 months ago