a benchmark suite for testing logical reasoning abilities of prompt-based models
☆32Nov 20, 2023Updated 2 years ago
Alternatives and similar repositories for LogiEval
Users that are interested in LogiEval are comparing it to the libraries listed below
Sorting:
- Dataset for AAAI paper "Natural Language Inference in Context - Investigating Contextual Reasoning over Long Texts"☆11Nov 18, 2022Updated 3 years ago
- Logiqa2.0 dataset - logical reasoning in MRC and NLI tasks☆102Aug 11, 2023Updated 2 years ago
- NAACL 2024: SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning☆26Mar 3, 2025Updated last year
- The official repository of the Omni-MATH benchmark.☆93Dec 22, 2024Updated last year
- A framework for few-shot evaluation of autoregressive language models.☆12Jul 14, 2025Updated 7 months ago
- [CVPR2024] Learning from Synthetic Human Group Activities☆14Feb 24, 2025Updated last year
- Chatbot that answers frequently asked questions in French, English, and Tunisian using the Rasa NLU framework and RWKV-4-Raven☆13May 19, 2023Updated 2 years ago
- A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.☆46Jan 16, 2025Updated last year
- DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …☆14Dec 12, 2024Updated last year
- RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best …☆10Nov 3, 2023Updated 2 years ago
- A Python script to delete all comment and submission data from a given Reddit account.☆11Jan 5, 2021Updated 5 years ago
- ☆10Oct 11, 2022Updated 3 years ago
- ☆43Oct 7, 2024Updated last year
- 中文金融大模型测评基准,六大类二十五任务、等级化评价,国内模型获得A级☆10May 6, 2024Updated last year
- Align, a general text alignment function☆15Dec 7, 2023Updated 2 years ago
- ELIXIR: Learning from User Feedback on Explanations to Improve Recommender Models☆10Feb 15, 2021Updated 5 years ago
- ☆12Mar 5, 2025Updated 11 months ago
- LLM red teaming datasets from the paper 'Student-Teacher Prompting for Red Teaming to Improve Guardrails' for the ART of Safety Workshop …☆22Oct 12, 2023Updated 2 years ago
- [AAAI26] Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilitie…☆10Feb 7, 2026Updated 3 weeks ago
- Simple LLM-enabled document Q&A app built using Langchain and Streamlit☆10Dec 4, 2024Updated last year
- Conversational Speaker Diarization using OpenAI AI Language Models(gpt-4) and OpenAI Whisper.☆14Aug 13, 2023Updated 2 years ago
- The official Python library for Openlayer, the Continuous Model Improvement Platform for AI. 📈☆16Updated this week
- We enable LLM with personalization capability☆11Nov 16, 2023Updated 2 years ago
- ☆10Oct 3, 2023Updated 2 years ago
- Massive-STEPS: Massive Semantic Trajectories for Understanding POI Check-ins -- Dataset and Benchmarks☆17Feb 2, 2026Updated last month
- This app uses OpenAI's LLM model to answer questions about your PDF file. Upload your PDF file and ask questions about it. The app will r…☆13May 13, 2025Updated 9 months ago
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- ☆11Oct 15, 2022Updated 3 years ago
- Shaping Language Models with Cognitive Insights☆15Feb 29, 2024Updated 2 years ago
- Survey of available speech datasets for Polish ASR development☆17Jan 1, 2025Updated last year
- benchmarks for evaluating MT models☆11Jun 26, 2024Updated last year
- LightRAG with Neo4j Example Project☆17May 19, 2025Updated 9 months ago
- ☆15Dec 2, 2025Updated 3 months ago
- Website for release of TellMeWhy dataset for why question answering☆14Nov 11, 2022Updated 3 years ago
- ARI (Abstract Reasoning Induction) is an innovative framework designed to enhance the temporal reasoning capabilities of Large Language M…☆13Dec 29, 2024Updated last year
- Code for our project CROWN (Conversational Passage Ranking by Reasoning over Word Networks)☆10Jan 11, 2024Updated 2 years ago
- ☆11Nov 5, 2024Updated last year
- Code and Data for GlitchBench☆13Feb 27, 2024Updated 2 years ago
- ☆10Oct 18, 2023Updated 2 years ago