xrr233 / Webformer
SIGIR-2022 Webformer: Pre-training with Web Pages for Information Retrieval
☆47Updated 2 years ago
Alternatives and similar repositories for Webformer:
Users that are interested in Webformer are comparing it to the libraries listed below
- Simplified DOM Trees for Transferable Attribute Extraction from the Web☆38Updated 6 months ago
- Code repo for ACL22 paper "DeepStruct: Pretraining of Language Models for Structure Prediction"☆84Updated 2 years ago
- Implementation of paper: HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer Reranking☆68Updated 2 years ago
- An Open-Source Package for Information Retrieval☆161Updated 3 weeks ago
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆104Updated 7 months ago
- Code and dataset for the emnlp paper titled Instruct and Extract: Instruction Tuning for On-Demand Information Extraction☆51Updated last year
- TAT-QA (Tabular And Textual dataset for Question Answering) contains 16,552 questions associated with 2,757 hybrid contexts from real-wor…☆104Updated 3 months ago
- Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.☆181Updated 2 years ago
- Tool for converting LLMs from uni-directional to bi-directional by removing causal mask for tasks like classification and sentence embedd…☆57Updated 3 months ago
- Dense X Retrieval: What Retrieval Granularity Should We Use?☆152Updated last year
- The code and data for "StructGPT: A general framework for Large Language Model to Reason on Structured Data"☆104Updated last year
- PyTorch implementation and pre-trained models for ASP - Autoregressive Structured Prediction with Language Models, EMNLP 22. https://arxi…☆105Updated last year
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆175Updated 2 years ago
- TUTA and ForTaP for Structure-Aware and Numerical-Reasoning-Aware Table Pre-Training☆108Updated 4 months ago
- An easy-to-use python toolkit for flexibly adapting various neural ranking models to target domain.☆59Updated last year
- The dataset contains 3 million attribute-value annotations across 1257 unique categories on 2.2 million cleaned Amazon product profiles. …☆138Updated 2 years ago
- [NAACL 2022] TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages☆19Updated 2 years ago
- Rethinking Negative Instances for Generative Named Entity Recognition [ACL 2024 Findings]☆49Updated last year
- 🌳CED: Catalog Extraction from Documents☆15Updated last year
- ☆16Updated 4 years ago
- ☆84Updated 6 months ago
- CLIR version of ColBERT☆67Updated 2 weeks ago
- [EMNLP 2021] The baseline code for WebSRC dataset.☆50Updated 2 years ago
- Code, datasets, and checkpoints for the paper "Improving Passage Retrieval with Zero-Shot Question Generation (EMNLP 2022)"☆100Updated 2 years ago
- Unofficial Pytorch implementation of Dom-LM paper.☆33Updated 2 years ago
- ☆40Updated 2 months ago
- [NAACL 2022] Robust (Controlled) Table-to-Text Generation with Structure-Aware Equivariance Learning.☆57Updated 11 months ago
- A Human-LLM Collaborative Dataset for Generative Information-seeking with Attribution☆30Updated last year
- This is the code for our KILT leaderboard submissions (KGI + Re2G models).☆153Updated last year
- Dataset and code for EMNLP2020 paper "HybridQA: A Dataset of Multi-Hop Question Answeringover Tabular and Textual Data"☆226Updated last year