joaodsmarques / LumberChunkerLinks
This repository presents the original implementation of LumberChunker: Long-Form Narrative Document Segmentation by André V. Duarte, João Marques, Miguel Graça, Miguel Freire, Lei Li and Arlindo L. Oliveira (accepted at EMNLP 2024 Findings)
☆76Updated 11 months ago
Alternatives and similar repositories for LumberChunker
Users that are interested in LumberChunker are comparing it to the libraries listed below
Sorting:
- Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception☆246Updated 3 months ago
- Open replication of DeepSeek R1 for text-to-graph extraction.☆98Updated 7 months ago
- Evaluation tools for Retrieval-augmented Generation (RAG) methods.☆165Updated 9 months ago
- [WWW 2025] A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System.☆111Updated last month
- ☆145Updated 4 months ago
- Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]☆85Updated 7 months ago
- [EMNLP 2024: Demo Oral] RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation☆305Updated 10 months ago
- Codes for our paper "RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation"☆184Updated last year
- ☆39Updated 5 months ago
- ☆192Updated 5 months ago
- [ACL24] Official repo for "Synthesizing Text-to-SQL Data from Weak and Strong LLMs"☆67Updated last year
- StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization☆142Updated 8 months ago
- The official implementation of "LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented…☆43Updated 5 months ago
- Code implement reposity of Paper HiQA☆102Updated 6 months ago
- ☆58Updated 10 months ago
- TianGong-AI-Unstructure☆69Updated 2 months ago
- made RAG pipeline better in table data☆102Updated 10 months ago
- All-in-One: Text Embedding, Retrieval, Reranking and RAG in Transformers☆66Updated last month
- [EMNLP 2024] LongRAG: A Dual-perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering☆112Updated 7 months ago
- ☆36Updated last year
- [ACL 2025] AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark☆155Updated last month
- ☆52Updated 7 months ago
- This is a repository of RALM surveys containing a summary of state-of-the-art RAG and other technologies☆201Updated last year
- Code for KaLM-Embedding models☆91Updated 2 months ago
- This is the official repository for Auto-RAG.☆221Updated last month
- The official repository for the paper: Evaluation of Retrieval-Augmented Generation: A Survey.☆174Updated 4 months ago
- Official code for the publication "Large Language Models as Zero-shot Dialogue State Tracker through Function Calling" https//arxiv.org/a…☆65Updated last year
- [ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".☆228Updated last year
- Official repository for paper "TableBench: A Comprehensive and Complex Benchmark for Table Question Answering"☆69Updated 4 months ago
- Code for explaining and evaluating late chunking (chunked pooling)☆447Updated 8 months ago