webis-de / ecir21-an-empirical-comparison-of-web-page-segmentation-algorithms
☆26Updated last month
Related projects: ⓘ
- Code for "Web Page Segmentation Revisited: Evaluation Framework and Dataset", accepted as resources paper to CIKM 2020☆13Updated last year
- Simplified DOM Trees for Transferable Attribute Extraction from the Web☆36Updated last year
- ☆16Updated 3 years ago
- Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.☆43Updated 4 months ago
- This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' pu…☆39Updated 2 years ago
- ☆55Updated last month
- Schema-Driven Information Extraction from Heterogeneous Tables☆20Updated 5 months ago
- [NAACL 2022] TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages☆19Updated 2 years ago
- ☆82Updated 3 weeks ago
- simple rule based named entity recognition☆42Updated 2 years ago
- ☆33Updated 3 weeks ago
- ☆45Updated 2 years ago
- A Neural Model for Joint Topic Segmentation and Classification☆34Updated 4 years ago
- Vespa application making an index of the CORD-19 dataset.☆39Updated 2 weeks ago
- SUPERT: Unsupervised multi-document summarization evaluation & generation☆91Updated last year
- Kex is a python library for unsupervised keyword extraction from a document, providing an easy interface and benchmarks on 15 public data…☆53Updated 2 years ago
- Seahorse is a dataset for multilingual, multi-faceted summarization evaluation. It consists of 96K summaries with human ratings along 6 q…☆84Updated 6 months ago
- 🦮 Code and pretrained models for Findings of ACL 2022 paper "LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrie…☆49Updated 2 years ago
- Implementation of Microsoft Vips algorithm in Python☆19Updated 4 years ago
- ☆83Updated 2 years ago
- EMNLP 2021 Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections☆49Updated 3 years ago
- Code for Relevance-guided Supervision for OpenQA with ColBERT (TACL'21)☆40Updated 3 years ago
- Corresponding code repo for the paper at COLING 2020 - ARGMIN 2020: "DebateSum: A large-scale argument mining and summarization dataset"☆51Updated 2 years ago
- PyTorch implementation and pre-trained models for ASP - Autoregressive Structured Prediction with Language Models, EMNLP 22. https://arxi…☆98Updated 7 months ago
- Web content extraction using machine learning☆32Updated 3 years ago
- SPRINT Toolkit helps you evaluate diverse neural sparse models easily using a single click on any IR dataset.☆40Updated last year
- init☆12Updated 3 years ago
- SIGIR-2022 Webformer: Pre-training with Web Pages for Information Retrieval☆47Updated 2 years ago
- Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering☆29Updated last year
- The official repository for Efficient Long-Text Understanding Using Short-Text Models (Ivgi et al., 2022) paper☆64Updated last year