This repository presents the original implementation of LumberChunker: Long-Form Narrative Document Segmentation by André V. Duarte, João Marques, Miguel Graça, Miguel Freire, Lei Li and Arlindo L. Oliveira (accepted at EMNLP 2024 Findings)
☆93Feb 9, 2026Updated last month
Alternatives and similar repositories for LumberChunker
Users that are interested in LumberChunker are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Recursive Abstractive Processing for Tree-Organized Retrieval☆10May 30, 2024Updated last year
- Exploration of semantic chunking and chunk classification☆19Sep 16, 2024Updated last year
- Official PyTorch implementation of OTiS: An open model for general time series analysis.☆17Feb 16, 2026Updated last month
- Dense X Retrieval: What Retrieval Granularity Should We Use?☆168Jan 8, 2024Updated 2 years ago
- ☆10Nov 29, 2024Updated last year
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- ☆21Jul 18, 2024Updated last year
- The code for paper: Hierarchical Document Refinement for Long-context Retrieval-augmented Generation [ACL2025 Oral]☆42Aug 25, 2025Updated 7 months ago
- Official Repository for "Hypencoder: Hypernetworks for Information Retrieval"☆35Sep 20, 2025Updated 6 months ago
- Code for explaining and evaluating late chunking (chunked pooling)☆495Dec 23, 2024Updated last year
- The code for AAAI 2025 “Large Language Models Are Read/Write Policy-Makers for Simultaneous Generation”☆15Jan 3, 2025Updated last year
- ☆19Jun 14, 2024Updated last year
- [ICLR 2026] BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs☆18May 21, 2025Updated 10 months ago
- ☆20Mar 22, 2024Updated 2 years ago
- 《2021医学健康数据分析与挖掘》课程论文 -- 基于BERT的20NewsGroups数据集新闻分类实验☆10Jun 22, 2021Updated 4 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- This is an implementation of the paper: Searching for Best Practices in Retrieval-Augmented Generation (EMNLP2024)☆344Dec 21, 2024Updated last year
- ☆10Jul 6, 2023Updated 2 years ago
- ragflow中的ocr部分,非官方项目☆54Aug 26, 2024Updated last year
- ☆10Jan 28, 2024Updated 2 years ago
- ☆11Feb 27, 2026Updated last month
- KITE (Knowledge-Intensive Task Evaluation) is an end-to-end benchmark for RAG pipelines☆23Aug 14, 2024Updated last year
- ☆12Jun 11, 2018Updated 7 years ago
- Plug in and Play implementation of "Certified Reasoning with Language Models" that elevates model reasoning by 40%☆16Jun 20, 2023Updated 2 years ago
- Code and Data for Our NeurIPS 2024 paper "AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback"☆34Nov 5, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆28May 27, 2024Updated last year
- Open replication of DeepSeek R1 for text-to-graph extraction.☆102Jan 31, 2025Updated last year
- Hercules: Attributable and Scalable Opinion Summarization (ACL 2023)☆20Nov 8, 2023Updated 2 years ago
- Nadir: Cutting-edge PyTorch optimizers for simplicity & composability! 🔥🚀💻☆14Jun 15, 2024Updated last year
- ☆18Aug 21, 2025Updated 7 months ago
- ☆12Mar 6, 2023Updated 3 years ago
- The way to code,the way to learn Pytorch☆12Aug 18, 2019Updated 6 years ago
- An example of using multimodal LLMs to processpide feed from camera and get image description☆15Mar 11, 2024Updated 2 years ago
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆90Nov 13, 2024Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- ☆82Nov 10, 2025Updated 4 months ago
- Code repo for CLERC: A Legal Precedent Dataset for Case Retrieval and Retrieval-Augmented Analysis Generation (NAACL 2025)☆25Jan 28, 2025Updated last year
- Towards LLM Empowered Recommendation via Tool Learning☆23Aug 8, 2025Updated 7 months ago
- Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers☆28Mar 1, 2025Updated last year
- machine reading comprehension with deep learning☆19Feb 6, 2018Updated 8 years ago
- ☆15Apr 10, 2024Updated last year
- ☆12Dec 13, 2023Updated 2 years ago