webis-de / cikm20-web-page-segmentation-revisited-evaluation-framework-and-dataset
Code for "Web Page Segmentation Revisited: Evaluation Framework and Dataset", accepted as resources paper to CIKM 2020
☆14Updated 2 years ago
Alternatives and similar repositories for cikm20-web-page-segmentation-revisited-evaluation-framework-and-dataset:
Users that are interested in cikm20-web-page-segmentation-revisited-evaluation-framework-and-dataset are comparing it to the libraries listed below
- ☆26Updated 6 months ago
- Implementation of Microsoft Vips algorithm in Python☆19Updated 5 years ago
- SIGIR-2022 Webformer: Pre-training with Web Pages for Information Retrieval☆47Updated 2 years ago
- Simplified DOM Trees for Transferable Attribute Extraction from the Web☆38Updated 4 months ago
- A Context-aware Visual Attention-based training pipeline for Object Detection from a Webpage screenshot!☆92Updated 2 years ago
- Web content extraction using machine learning☆32Updated 3 years ago
- It includes two datasets that are used in the downstream tasks for evaluating UIBert: App Similar Element Retrieval data and Visual Item …☆41Updated 3 years ago
- Consists of ~500k human annotations on the RICO dataset identifying various icons based on their shapes and semantics, and associations b…☆27Updated 7 months ago
- Detectron2 Webserver (Faster-RCNN) implementation for Ubuntu 20.04. Real time object detection served over the internet.☆31Updated 2 years ago
- This repository contains the opensource version of the datasets were used for different parts of training and testing of models that grou…☆32Updated 4 years ago
- Dataset and scripts for HRDoc☆35Updated last year
- Semantic Code Search☆34Updated last year
- Web page segmentation and noise removal☆55Updated last year
- The WordScape repository contains code for the WordScape pipeline to create datasets to train document understanding models.☆33Updated last year
- This repository contains the ToolSelect dataset which was used to fine-tune Llama-2 70B for tool selection.☆20Updated 11 months ago
- Deep Dependency Representation☆16Updated 6 years ago
- Article extraction benchmark: dataset and evaluation scripts☆301Updated 9 months ago
- ☆63Updated last month
- A research prototype tool to repair Selenium E2E test cases through computer vision☆8Updated 6 years ago
- The dataset includes UI object type labels (e.g., BUTTON, IMAGE, CHECKBOX) that describes the semantic type of an UI object on Android ap…☆48Updated 3 years ago
- Training/test data for Dragnet☆41Updated 10 years ago
- ☆36Updated 6 years ago
- Linking Entities in CommonCrawl Dataset onto Wikipedia Concepts☆59Updated 12 years ago
- ☆110Updated last year
- Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.☆44Updated 9 months ago
- Object Detection for Graphical User Interface: Old Fashioned or Deep Learning or a Combination?☆128Updated last year
- code and data used to build a training dataset for dragnet models☆10Updated 4 years ago
- Evaluating tool-augmented LLMs in conversation settings☆77Updated 8 months ago
- TeX compilation service that makes use of arXiv.org's AutoTeX library.☆28Updated 8 months ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆46Updated last year