webis-de / cikm20-web-page-segmentation-revisited-evaluation-framework-and-dataset
Code for "Web Page Segmentation Revisited: Evaluation Framework and Dataset", accepted as resources paper to CIKM 2020
☆14Updated last year
Related projects ⓘ
Alternatives and complementary repositories for cikm20-web-page-segmentation-revisited-evaluation-framework-and-dataset
- ☆26Updated 3 months ago
- Simplified DOM Trees for Transferable Attribute Extraction from the Web☆37Updated last month
- Implementation of Microsoft Vips algorithm in Python☆19Updated 5 years ago
- Web content extraction using machine learning☆32Updated 3 years ago
- Web page segmentation and noise removal☆55Updated 9 months ago
- A Context-aware Visual Attention-based training pipeline for Object Detection from a Webpage screenshot!☆91Updated last year
- SIGIR-2022 Webformer: Pre-training with Web Pages for Information Retrieval☆47Updated 2 years ago
- Tools for web page segmentation. In development☆17Updated 6 years ago
- Training/test data for Dragnet☆41Updated 9 years ago
- Implementation of Vision Based Page Segmentation algorithm in Java☆101Updated 5 years ago
- code and data used to build a training dataset for dragnet models☆10Updated 3 years ago
- Deep Dependency Representation☆16Updated 6 years ago
- simple rule based named entity recognition☆43Updated 2 years ago
- Content Extraction via Text Density (SIGIR11)☆24Updated 9 years ago
- Unofficial Pytorch implementation of Dom-LM paper.☆32Updated last year
- Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.☆43Updated 5 months ago
- A python implementation of DEPTA☆83Updated 7 years ago
- It includes two datasets that are used in the downstream tasks for evaluating UIBert: App Similar Element Retrieval data and Visual Item …☆41Updated 3 years ago
- Code for EMNLP'20 paper "When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models"☆11Updated 4 years ago
- Code for building ConceptNet from raw data.☆19Updated 9 months ago
- MultiCite code and data. Models are available on Huggingface.☆29Updated 2 years ago
- CHOLAN: A Modular Approach for Neural Entity Linking on Wikipedia and Wikidata☆33Updated 2 years ago
- Source code for the paper "Web2Text: Deep Structured Boilerplate Removal", full paper @ ECIR'18☆167Updated 3 years ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆173Updated last year
- A simple algorithm for clustering web pages, suitable for crawlers☆34Updated 7 years ago
- Hearst Patterns to extract Hypernyms from text☆11Updated 5 years ago
- The official implementation of "Distilling Relation Embeddings from Pre-trained Language Models, EMNLP 2021 main conference", a high-qual…☆46Updated last year
- Neural-IR-Explorer: A Content-Focused Tool to Explore Neural Re-Ranking Results☆33Updated 4 years ago
- Tools for web page segmentation evaluation☆13Updated 5 years ago
- Figma Files Scraper for Research & Studies☆21Updated last year