megagonlabs / holobenchLinks
🫧 Code for Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data (Maekawa*, Iso* et al.; ICLR 2025)
☆12Updated 6 months ago
Alternatives and similar repositories for holobench
Users that are interested in holobench are comparing it to the libraries listed below
Sorting:
- Common tools for data processing☆18Updated 2 weeks ago
- SysBench: Can Large Language Models Follow System Messages?☆34Updated last year
- Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs (EMNLP 2024)☆14Updated 9 months ago
- NAACL 2024: SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning☆25Updated 6 months ago
- Implementation for ACL 2024 paper "Meta-Task Prompting Elicits Embeddings from Large Language Models"☆12Updated last year
- Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval☆51Updated 2 months ago
- Official Implementation of "Probing Language Models for Pre-training Data Detection"☆19Updated 9 months ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆62Updated last year
- Test-time compute in information retrieval☆42Updated last month
- ☆22Updated 8 months ago
- [ICLR 2025] BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval☆163Updated 3 months ago
- First explanation metric (diagnostic report) for text generation evaluation☆62Updated 6 months ago
- List of papers on Self-Correction of LLMs.☆74Updated 8 months ago
- FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions☆45Updated last year
- Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)☆25Updated 9 months ago
- [EMNLP 2025] LightThinker: Thinking Step-by-Step Compression☆83Updated 4 months ago
- [ACL 2025 Main] Official Repository for "Evaluating Language Models as Synthetic Data Generators"☆38Updated 8 months ago
- code for Preprint paper at Arxiv: MoT: Pre-thinking and Recalling Enable ChatGPT to Self-Improve with Memory-of-Thoughts☆22Updated last year
- This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.☆42Updated last year
- Code and data for paper "Context-faithful Prompting for Large Language Models".☆41Updated 2 years ago
- [COLING 2025] NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models☆17Updated 7 months ago
- [ICLR'25] Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"☆77Updated 9 months ago
- Exploring the Limitations of Large Language Models on Multi-Hop Queries☆27Updated 6 months ago
- "FiD-ICL: A Fusion-in-Decoder Approach for Efficient In-Context Learning" (ACL 2023)☆15Updated 2 years ago
- ☆43Updated last year
- ☆22Updated last year
- Benchmarking Benchmark Leakage in Large Language Models☆55Updated last year
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆22Updated last year
- Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".☆78Updated last year
- BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent☆68Updated last week