Comprehensive LLM evaluation at scale: A production-ready framework for evaluating large language models across multiple benchmarks.
☆38Apr 2, 2026Updated last week
Alternatives and similar repositories for eval-framework
Users that are interested in eval-framework are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Complete set of English dialect transformation rules and evaluation code☆16Jun 7, 2024Updated last year
- Can Large Language Models Identify Authorship? (EMNLP 2024 Findings)☆13Feb 4, 2025Updated last year
- Split bib files for anthology bibliography for overleaf☆11Aug 25, 2024Updated last year
- [NeurIPS 2023 - ML for Audio Workshop (Oral)] Zero-shot audio captioning with audio-language model guidance and audio context keywords☆18Nov 30, 2024Updated last year
- ACL 2021 paper "Style is NOT a single variable: Case Studies for Cross-Style Language Understanding " by Dongyeop Kang and Eduard Hovy☆15Jul 19, 2021Updated 4 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Code for DVD A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue☆14Oct 12, 2021Updated 4 years ago
- Code for the paper BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues (EMNLP20)☆11Jun 16, 2025Updated 9 months ago
- DSTC8-AVSD: Sentence generation task for Audio Visual Scene-aware Dialog☆14Jun 10, 2021Updated 4 years ago
- ☆24Sep 26, 2025Updated 6 months ago
- Dialogue Act classification☆18Jan 15, 2024Updated 2 years ago
- [NeurIPS 2023 Spotlight] In-Context Impersonation Reveals Large Language Models' Strengths and Biases☆22Nov 30, 2024Updated last year
- ☆15Aug 20, 2024Updated last year
- CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations☆29Oct 27, 2023Updated 2 years ago
- Dataset and models for paper "Game-Based Video-Context Dialogue (EMNLP 2018)"☆19Oct 25, 2018Updated 7 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- ☆54Dec 10, 2025Updated 4 months ago
- [ACL 2025 Main] Official Repo for Paper "Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric"☆38Feb 10, 2026Updated 2 months ago
- Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.☆30Jun 12, 2023Updated 2 years ago
- ☆15Aug 13, 2020Updated 5 years ago
- A Collection of Pydantic Models to Abstract IRL☆39Dec 10, 2025Updated 4 months ago
- Tooling for exact and MinHash deduplication of large-scale text datasets☆77Mar 24, 2026Updated 2 weeks ago
- [EMNLP 2020] Collective HumAn OpinionS on Natural Language Inference Data☆40Apr 7, 2022Updated 4 years ago
- Official Repository of NeurIPS2021 paper: PTR☆32Dec 17, 2021Updated 4 years ago
- Simple-to-use scoring function for arbitrarily tokenized texts.☆48Feb 19, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Code for ''A Simple Baseline for Audio-Visual Scene-Aware Dialog``☆27May 26, 2020Updated 5 years ago
- Dataset and Source code for EMNLP 2019 paper "What You See is What You Get: Visual Pronoun Coreference Resolution in Dialogues"☆26Sep 10, 2021Updated 4 years ago
- ☆30Oct 20, 2021Updated 4 years ago
- An Educational Framework Based on PyTorch for Deep Learning Education and Exploration☆11Dec 24, 2023Updated 2 years ago
- A RAG that can scale 🧑🏻💻☆11May 28, 2024Updated last year
- PyTorch code for Reasoning Visual Dialogs with Structural and Partial Observations☆42Jun 30, 2021Updated 4 years ago
- Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic pr…☆70Mar 25, 2026Updated 2 weeks ago
- Karaoke Editor☆61Jan 9, 2024Updated 2 years ago
- PyTorch implementation of paper "Visual Concept-Metaconcept Learner", NeruIPS 2019☆47Dec 3, 2019Updated 6 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- PyTorch implementation of the paper "Dialogue Act Classification with Context-Aware Self-Attention" for dialogue act classification with …☆45Aug 1, 2023Updated 2 years ago
- Repository to generate CLEVR-Dialog: A diagnostic dataset for Visual Dialog☆49Feb 18, 2020Updated 6 years ago
- ✨ Official PyTorch Implementation for EMNLP'19 Paper, "Dual Attention Networks for Visual Reference Resolution in Visual Dialog"☆45Mar 19, 2023Updated 3 years ago
- A Model Agnostic function to directly remove specified layers from the LLM☆10May 23, 2024Updated last year
- See https://github.com/cuda-mode/triton-index/ instead!☆11May 8, 2024Updated last year
- A list where most values will be None (or default)☆11Jul 19, 2023Updated 2 years ago
- 🤖📚 Telegram bot to convert and email PDFs, EPUBs or MOBIs to your Kindle☆11Sep 16, 2022Updated 3 years ago