Data for the MTEB leaderboard
☆46Feb 23, 2026Updated last week
Alternatives and similar repositories for results
Users that are interested in results are comparing it to the libraries listed below
Sorting:
- Code for the MTEB leaderboard☆30Feb 4, 2025Updated last year
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- ☆12Mar 5, 2025Updated 11 months ago
- ☆19Sep 16, 2025Updated 5 months ago
- [ACL 2025] AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark☆166Oct 14, 2025Updated 4 months ago
- QALD-9-Plus Dataset for Knowledge Graph Question Answering☆29Jun 5, 2024Updated last year
- The code used to evaluate embedding models on the Massive Legal Embedding Benchmark (MLEB).☆31Feb 24, 2026Updated last week
- Explore VLM-Eval, a framework for evaluating Video Large Language Models, enhancing your video analysis with cutting-edge AI technology.☆37Jan 20, 2024Updated 2 years ago
- CODO is an ontology for the semantic representation and annotation of COVID-19 data in a machine-readable form for tracking history of th…☆10Apr 19, 2022Updated 3 years ago
- Visual tool for SPARQL queries on graphol graphs☆10Oct 3, 2018Updated 7 years ago
- ☆12Updated this week
- Maintenance Information Extraction (MaintIE)☆16Jun 29, 2024Updated last year
- Think-on-Graph 3.0: Efficient and Adaptive LLM Reasoning on Heterogeneous Graphs via Multi-Agent Dual-Evolving Context Retrieval☆75Dec 5, 2025Updated 2 months ago
- DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …☆14Dec 12, 2024Updated last year
- A Swedish Natural Language Understanding Benchmark☆11Dec 12, 2025Updated 2 months ago
- Quickly run SchemaSpy on a database and serve the results☆10Mar 24, 2021Updated 4 years ago
- RDF Community Discussions. Ask anything here!☆13Apr 11, 2024Updated last year
- [CVPR2024] Learning from Synthetic Human Group Activities☆14Feb 24, 2025Updated last year
- A framework for few-shot evaluation of autoregressive language models.☆12Jul 14, 2025Updated 7 months ago
- Code for AAAI Workshop WMAC "Paper Simulating Rumor Spreading in Social Networks using LLM agents"☆11Feb 20, 2025Updated last year
- ☆12Jan 11, 2026Updated last month
- a simple lakeFS webhook for pre-commit and pre-merge validation of data objects☆12Nov 9, 2023Updated 2 years ago
- Elasticsearch plugin for Sentiment Analysis using Stanford CoreNLP☆11Oct 17, 2018Updated 7 years ago
- Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark☆11Mar 27, 2025Updated 11 months ago
- Ace-Step Dataset Generator☆23Sep 27, 2025Updated 5 months ago
- Spanish text summarization demo using CoreNLP☆10Sep 13, 2014Updated 11 years ago
- A Reactive Sparql Client written in Scala and Akka☆13Sep 18, 2023Updated 2 years ago
- Analysis of gutenberg dataset☆44Dec 22, 2018Updated 7 years ago
- Lehigh University Benchmark (LUBM).☆10Apr 22, 2020Updated 5 years ago
- ☆11Nov 5, 2024Updated last year
- LLM red teaming datasets from the paper 'Student-Teacher Prompting for Red Teaming to Improve Guardrails' for the ART of Safety Workshop …☆22Oct 12, 2023Updated 2 years ago
- ISS Tracker for the Cardputer Adv☆36Jan 19, 2026Updated last month
- Creates a Lucene index out of files from a local folder☆13Aug 8, 2014Updated 11 years ago
- Shaping Language Models with Cognitive Insights☆15Feb 29, 2024Updated 2 years ago
- JVM bytecode assembler as REST api☆11Jul 27, 2025Updated 7 months ago
- CKAN extension for data.world☆12Dec 5, 2023Updated 2 years ago
- Simple FieldCache based query introspection Solr Search Component - solves the 'red sofa' problem☆11Jan 27, 2025Updated last year
- ☀️ Measuring the accuracy of BBC weather forecasts in Honolulu, USA☆12Jul 10, 2021Updated 4 years ago
- DICE: Detecting In-distribution Data Contamination with LLM's Internal State☆11Sep 21, 2024Updated last year