webis-de / cikm20-web-page-segmentation-revisited-evaluation-framework-and-dataset
Code for "Web Page Segmentation Revisited: Evaluation Framework and Dataset", accepted as resources paper to CIKM 2020
☆14Updated last year
Related projects ⓘ
Alternatives and complementary repositories for cikm20-web-page-segmentation-revisited-evaluation-framework-and-dataset
- ☆26Updated 3 months ago
- Content Extraction via Text Density (SIGIR11)☆24Updated 9 years ago
- Implementation of Vision Based Page Segmentation algorithm in Java☆101Updated 5 years ago
- Simplified DOM Trees for Transferable Attribute Extraction from the Web☆37Updated last month
- Implementation of Microsoft Vips algorithm in Python☆19Updated 5 years ago
- code and data used to build a training dataset for dragnet models☆10Updated 3 years ago
- MultiCite code and data. Models are available on Huggingface.☆29Updated 2 years ago
- Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.☆43Updated 6 months ago
- Timeline Summarization based on Event Graph Compression via Time-Aware Optimal Transport☆15Updated 3 years ago
- Training/test data for Dragnet☆41Updated 9 years ago
- SIGIR-2022 Webformer: Pre-training with Web Pages for Information Retrieval☆47Updated 2 years ago
- This is a repository of the study performed under the Adversarial Paraphrasing Task (APT).☆21Updated 3 years ago
- Web content extraction using machine learning☆32Updated 3 years ago
- A Context-aware Visual Attention-based training pipeline for Object Detection from a Webpage screenshot!☆91Updated last year
- The Harvard USPTO Patent Dataset☆55Updated 11 months ago
- Code accompanying the submission "Structural Text Segmentation of Legal Documents" by Aumiller et al.☆96Updated last year
- Large-scale query-focused multi-document Summarization dataset☆10Updated 3 years ago
- It includes two datasets that are used in the downstream tasks for evaluating UIBert: App Similar Element Retrieval data and Visual Item …☆41Updated 3 years ago
- A Neural Model for Joint Topic Segmentation and Classification☆34Updated 4 years ago
- Legal document similarity - Code, data, and models for the ICAIL 2021 paper "Evaluating Document Representations for Content-based Legal …☆31Updated 3 years ago
- Code for EMNLP'20 paper "When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models"☆11Updated 4 years ago
- ☆11Updated 6 months ago
- An NLP processing pipeline for characters in fanfiction. Developed by students at Carnegie Mellon University from 2019-2021.☆31Updated 2 months ago
- Web page segmentation and noise removal☆55Updated 9 months ago
- Tools for web page segmentation evaluation☆13Updated 5 years ago
- Measure the readability of a given text using surface characteristics☆72Updated last year
- An easy to use framework for large-scale fact-checking and question answering☆69Updated last year
- An open-source NLP library: fast text cleaning and preprocessing☆23Updated 3 years ago
- Bilingual sengence aligner☆27Updated last year
- ☆25Updated 2 years ago