This repository presents the original implementation of LumberChunker: Long-Form Narrative Document Segmentation by André V. Duarte, João Marques, Miguel Graça, Miguel Freire, Lei Li and Arlindo L. Oliveira (accepted at EMNLP 2024 Findings)
☆102Feb 9, 2026Updated 2 months ago
Alternatives and similar repositories for LumberChunker
Users that are interested in LumberChunker are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Exploration of semantic chunking and chunk classification☆19Sep 16, 2024Updated last year
- Code for "HiChunk: Evaluating and Enhancing Retrieval-Augmented Generation with Hierarchical Chunking"☆94Nov 18, 2025Updated 4 months ago
- Official PyTorch implementation of OTiS: An open model for general time series analysis.☆18Apr 1, 2026Updated 2 weeks ago
- Dense X Retrieval: What Retrieval Granularity Should We Use?☆169Jan 8, 2024Updated 2 years ago
- ☆10Nov 29, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆21Jul 18, 2024Updated last year
- Official Repository for "Hypencoder: Hypernetworks for Information Retrieval"☆35Sep 20, 2025Updated 6 months ago
- ☆19Jun 14, 2024Updated last year
- 如需体验textin文档解析,请点击https://cc.co/16YSIy☆22Jul 9, 2024Updated last year
- [ICLR 2026] BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs☆18May 21, 2025Updated 10 months ago
- ☆12Jun 11, 2018Updated 7 years ago
- KITE (Knowledge-Intensive Task Evaluation) is an end-to-end benchmark for RAG pipelines☆23Aug 14, 2024Updated last year
- Plug in and Play implementation of "Certified Reasoning with Language Models" that elevates model reasoning by 40%☆16Jun 20, 2023Updated 2 years ago
- Code and Data for Our NeurIPS 2024 paper "AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback"☆34Nov 5, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Pre-training BART in Flax on The Pile dataset☆22Jul 24, 2021Updated 4 years ago
- ☆13Oct 24, 2023Updated 2 years ago
- A toolbox for EEG signals processing. Welcome to join and build!☆13Nov 9, 2022Updated 3 years ago
- Hercules: Attributable and Scalable Opinion Summarization (ACL 2023)☆20Nov 8, 2023Updated 2 years ago
- ☆18Aug 21, 2025Updated 7 months ago
- An example of using multimodal LLMs to processpide feed from camera and get image description☆15Mar 11, 2024Updated 2 years ago
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆91Nov 13, 2024Updated last year
- ☆13Jul 15, 2021Updated 4 years ago
- ☆83Nov 10, 2025Updated 5 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers☆29Mar 1, 2025Updated last year
- ☆12Dec 13, 2023Updated 2 years ago
- minimalistic AI library that resembles HF's transformers☆13Dec 31, 2024Updated last year
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.☆12Oct 10, 2020Updated 5 years ago
- Official PyTorch implementation of The Linear Attention Resurrection in Vision Transformer☆16Sep 7, 2024Updated last year
- ☆55Jul 10, 2025Updated 9 months ago
- CDQA: Chinese Dynamic Question Answering Benchmark☆18Dec 13, 2024Updated last year
- mnn asr demo.☆26Mar 24, 2025Updated last year
- ☆12Mar 25, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)☆3,445Mar 1, 2026Updated last month
- ChatGPT中文学习和实践资料汇总——LLaMA、ChatGLM等大模型的Finetune☆14Apr 17, 2023Updated 2 years ago
- 本项目用于Embedding模型的相关实验,包括Embedding模型评估、Embedding模型微调、Embedding模型量化等。☆73Jul 16, 2024Updated last year
- This is the official repository for Auto-RAG.☆234Jul 18, 2025Updated 8 months ago
- node.js 敏感词/违禁词 检测 替换 过滤 ,超高效率,极小内存(8万个违禁词仅需要30MB)☆15Apr 15, 2025Updated last year
- The most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selec…☆16Nov 14, 2023Updated 2 years ago
- ☆34Oct 9, 2025Updated 6 months ago