shaharl6000/MoreDocsSameLen

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/shaharl6000/MoreDocsSameLen)

shaharl6000 / MoreDocsSameLen

This repository contains code and datasets for our paper on the effects of document multiplicity while the context size is fixed in Retrieval-Augmented Generation (RAG) systems.

☆18

Alternatives and similar repositories for MoreDocsSameLen

Users that are interested in MoreDocsSameLen are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

edahanoam / Awesome-Summarization-Datasets
View on GitHub
Updating collection of summarization datasets in 100+ languages, based on our paper "The State and Fate of Summarization Datasets: A Surv…
☆31Apr 29, 2025Updated last year
Yarayx / livelongbench
View on GitHub
The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…
☆12Jun 28, 2025Updated last year
MasterVito / DAC-RL
View on GitHub
Official Repo for DAC-RL: Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability
☆16Feb 26, 2026Updated 5 months ago
lt-asset / Waffle
View on GitHub
For ACL25 paper "WAFFLE: Multi-Modal Model for Automated Front-End Development" - by Shanchao Liang and Nan Jiang and Shangshu Qian and L…
☆12May 28, 2025Updated last year
dimalik / prediction_error
View on GitHub
Neural embeddings with negative sampling in Keras
☆11Jun 11, 2017Updated 9 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
eliorsulem / simplification-acl2018
View on GitHub
Human Evaluation Benchmark for Text Simplification
☆10Sep 6, 2018Updated 7 years ago
RUCKBReasoning / CodeRM
View on GitHub
Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of Unit Tests for Code Reward Modeling'
☆27May 16, 2025Updated last year
a-raina / Event-Detection-using-NLP
View on GitHub
Be notified of recent events in the news by setting up alerts. Program uses NLP techniques such as keyword matching, k-clustering and sem…
☆11Jun 27, 2016Updated 10 years ago
timjogorman / Multisentence-AMR-guidelines
View on GitHub
Guidelines for our secondary layer of annotation adding multi-sentence AMR links
☆12Sep 6, 2017Updated 8 years ago
casetext / r-and-r
View on GitHub
Code for the "Long Context Needs Some R&R" paper.
☆12Mar 11, 2024Updated 2 years ago
DCSaunders / gender-debias
View on GitHub
Adaptation datasets and scripts for the paper "Reducing gender bias in Neural Machine Translation as a domain adaptation problem" (ACL 20…
☆13Mar 18, 2021Updated 5 years ago
facebookresearch / BigOBench
View on GitHub
BigOBench assesses the capacity of Large Language Models (LLMs) to comprehend time-space computational complexity of input or generated c…
☆43Apr 15, 2025Updated last year
crushr / EANN_Implemetation
View on GitHub
EANN(Pytorch)
☆10Mar 12, 2022Updated 4 years ago
xv44586 / cluster
View on GitHub
中文无监督文本聚类
☆14Mar 3, 2022Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
mathCrazyy / text_classify
View on GitHub
针对Cnews数据集进行分类，使用了torchtext进行文本预处理
☆11Sep 16, 2022Updated 3 years ago
asappresearch / interactive-classification
View on GitHub
☆15Feb 24, 2021Updated 5 years ago
julianmichael / qasrl
View on GitHub
Tools for working with QA-SRL data and annotating it with crowdsourcing.
☆13Sep 22, 2023Updated 2 years ago
SLAB-NLP / Multi-Prompt-LLM-Evaluation
View on GitHub
State of What Art? A Call for Multi-Prompt LLM Evaluation
☆16Apr 10, 2026Updated 3 months ago
Qichuzyy / POA
View on GitHub
Official implementation of ECCV24 paper: POA
☆24Aug 8, 2024Updated last year
sythello / ChartDialog
View on GitHub
A dataset for training interactive plotting agent
☆14Dec 8, 2022Updated 3 years ago
UKPLab / emnlp2017-cmapsum-corpus
View on GitHub
Accompanying code for our EMNLP 2017 publication "Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps"
☆13Dec 5, 2017Updated 8 years ago
osome-iu / ChatGPT_domain_rating
View on GitHub
Code and data for paper "Large language models can rate news outlet credibility"
☆13Aug 10, 2024Updated last year
fgaim / HornMorpho
View on GitHub
Morphological analysis and generation of Amharic, Oromo, and Tigrinya
☆13Feb 18, 2017Updated 9 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
wuyike2000 / CoTKR
View on GitHub
☆32Jan 13, 2025Updated last year
uwnlp / qasrl_annotation
View on GitHub
Generating Annotation Spreadsheet for QA-SRL Scheme
☆12Feb 14, 2017Updated 9 years ago
selfcs / stop-and-sensitive-words
View on GitHub
停用词和敏感词库
☆17Oct 15, 2020Updated 5 years ago
WenjiaZh / BTIC
View on GitHub
☆11Mar 13, 2023Updated 3 years ago
unimorph / analyzers
View on GitHub
Runnable morphological analysis tools from the UniMorph project
☆16Nov 19, 2018Updated 7 years ago
plroit / qasrl-gs
View on GitHub
A repository for high-quality QASRL data collected from crowd-workers.
☆11Aug 10, 2023Updated 2 years ago
Lotemp / SarcasmSIGN
View on GitHub
Interpreting Sarcasm with Sentiment Based Monolingual Machine Translation
☆11May 7, 2017Updated 9 years ago
ssu-humane / fake-news-thumbnail
View on GitHub
A dataset and CLIP baseline for unrepresentative news thumbnail detection (ACL 2022 workshop)
☆12May 26, 2022Updated 4 years ago
MaYufei-NPU / InfoGain-RAG
View on GitHub
Implementation of EMNLP Oral Paper: InfoGain-RAG: Boosting Retrieval-Augmented Generation through Document Information Gain-based Reranki…
☆18Sep 17, 2025Updated 10 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
timbrgr / complex-scheduling-optimization-case-studies
View on GitHub
Optimization Case Studies: Generic Time Scheduling Problem (GTSP), Resource-Constrained Project Scheduling Problem (RCPSP) with Pulse Var…
☆11Nov 7, 2018Updated 7 years ago
SLAB-NLP / Akk
View on GitHub
Filling the Gaps in Ancient Akkadian Texts:A Masked Language Modelling Approach, Lazar et al., EMNLP 2021
☆14Nov 10, 2022Updated 3 years ago
genglinliu / MOSAIC
View on GitHub
☆37Apr 22, 2025Updated last year
UESTC-GQJ / TieFake
View on GitHub
This is the source code of IJCNN 2023 paper TieFake: Title-Text Similarity and Emotion-Aware Fake News Detection (TieFake).
☆16Dec 21, 2023Updated 2 years ago
microsoft / TestExplora
View on GitHub
This is an official code for the paper: TestExplora: Benchmarking LLMs for Proactive Bug Discovery via Repository-Level Test Generation
☆27Mar 26, 2026Updated 4 months ago
Itaymanes / K-QA
View on GitHub
Dataset and Evaluation Code for the K-QA Benchmark.
☆18May 26, 2024Updated 2 years ago
alchemistyzz / PeRL
View on GitHub
[NeurIPS'25] The official code of "PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning"
☆30Mar 30, 2026Updated 3 months ago