Comprehensive LLM evaluation framework: GPQA Diamond to Chatbot Arena. Tests all major models equally, easily extensible.
☆17Aug 22, 2024Updated last year
Alternatives and similar repositories for BenchmarkAggregator
Users that are interested in BenchmarkAggregator are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- GOPHI: an AMR-to-English Verbalizer☆11Feb 5, 2020Updated 6 years ago
- ☆10Jun 11, 2019Updated 6 years ago
- ☆35Jan 25, 2026Updated last month
- PANiC - PAraphrasing Noun-Compounds☆15Apr 6, 2018Updated 7 years ago
- Python library providing a simple, fully supervised sentence embedding technique for textual adversarial attacks.☆13Dec 13, 2023Updated 2 years ago
- Data and all☆14Sep 30, 2019Updated 6 years ago
- Data and related code for ACL2019 paper "Implicit Discourse Relation Identification for Open-domain Dialogues"☆12Jul 29, 2019Updated 6 years ago
- [ECAI 2023] Official implementation of "FATRER: Full-Attention Topic Regularizer for Accurate and Robust Conversational Emotion Recogniti…☆13Oct 9, 2023Updated 2 years ago
- Automated Semantic Analysis of Discourse Markers☆11May 30, 2022Updated 3 years ago
- An implementation of Defeasible Logic in Python☆15Sep 2, 2018Updated 7 years ago
- An open-source NLP library: fast text cleaning and preprocessing☆23Nov 9, 2021Updated 4 years ago
- Allows two LLMs to communicate and run code in the terminal☆28Dec 8, 2024Updated last year
- Visual and Embodied Concepts evaluation benchmark☆21Oct 10, 2023Updated 2 years ago
- GQR, a Fast Reasoner for Binary Qualitative Constraint Calculi☆19Nov 11, 2017Updated 8 years ago
- 🐍A curated list of awesome python environment.☆13Apr 21, 2020Updated 5 years ago
- ☆18Feb 29, 2024Updated 2 years ago
- Reference implementation for the climate segmentation benchmark, based on the Exascale Deep Learning for Climate Analytics work☆10May 6, 2020Updated 5 years ago
- Final Year Masters Project: modal logic solver tableaux☆25May 26, 2022Updated 3 years ago
- ☆19Dec 26, 2022Updated 3 years ago
- Code for ICLR 2019 paper 'CBOW Is Not All You Need: Combining CBOW with the Compositional Matrix Space Model'☆21May 21, 2019Updated 6 years ago
- Information and artifacts for "LoRA Learns Less and Forgets Less" (TMLR, 2024)☆20Sep 27, 2024Updated last year
- ☆25Aug 2, 2025Updated 7 months ago
- ☆13Apr 6, 2025Updated 11 months ago
- DeepSeek-V3, R1 671B on 8xH100 Throughput Benchmarks☆21Mar 13, 2025Updated last year
- This is my code from competition Google Cloud & YouTube-8M Video Understanding Challenge. My solution based on video level features only.☆16Jun 5, 2017Updated 8 years ago
- A web interactive tool for building proofs in the sequent calculus of Linear Logic, with its backend written in OCaml☆24Apr 7, 2025Updated 11 months ago
- ☆26Apr 15, 2023Updated 2 years ago
- Estimating neural network runtime characteristics☆12Mar 25, 2023Updated 2 years ago
- Simple Streamlit application used for demonstrating Anthropic Claude 3 family of model's multimodal prompting on Amazon Bedrock☆16Dec 5, 2024Updated last year
- Dataset from Tip of the Tongue Known-Item Retrieval (2021) paper.☆12Nov 4, 2021Updated 4 years ago
- Telegram bridge for Claude Code and Codex CLI☆70Feb 26, 2026Updated 3 weeks ago
- A set of tools to simplify development for JavaScript based SmartTVs☆14Nov 15, 2025Updated 4 months ago
- Python Module for Logical Validation (forked from Rob Truxler library)☆26Jul 28, 2020Updated 5 years ago
- ☆18Feb 23, 2025Updated last year
- ☆12May 30, 2025Updated 9 months ago
- Official Implementation of "CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks"☆23Jun 17, 2025Updated 9 months ago
- ☆23Dec 5, 2025Updated 3 months ago
- an auto-sleeping and -waking framework around llama.cpp☆12Feb 8, 2025Updated last year
- ☆19Sep 24, 2024Updated last year