google-research-datasets / MAVE
The dataset contains 3 million attribute-value annotations across 1257 unique categories on 2.2 million cleaned Amazon product profiles. It is a large, multi-sourced, diverse dataset for product attribute extraction study.
☆138Updated 2 years ago
Alternatives and similar repositories for MAVE:
Users that are interested in MAVE are comparing it to the libraries listed below
- ACL19-Scaling Up Open Tagging from Tens to Thousands☆17Updated 5 years ago
- A library to conduct ranking experiments with transformers.☆161Updated last year
- Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations☆132Updated 9 months ago
- This repository contains the code to reproduce the experiments of the poster "Supervised Contrastive Learning for Product Matching"☆38Updated 3 years ago
- Implementation of paper: HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer Reranking☆68Updated 2 years ago
- Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data☆100Updated last year
- MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, …☆123Updated 3 years ago
- A Pytorch implementation of "Scaling Up Open Tagging from Tens to Thousands: Comprehension Empowered Attribute Value Extraction from Prod…☆60Updated 5 years ago
- ☆86Updated 4 years ago
- source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.☆56Updated 3 years ago
- SIGIR 2021: Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling☆59Updated 3 years ago
- Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation☆109Updated 3 years ago
- [ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning☆92Updated 2 years ago
- EMNLP 2021 - Pre-training architectures for dense retrieval☆246Updated 3 years ago
- RepBERT is a competitive first-stage retrieval technique. It represents documents and queries with fixed-length contextualized embeddings…☆66Updated 3 years ago
- Code repo for EMNLP21 paper "Zero-Shot Information Extraction as a Unified Text-to-Triple Translation"☆108Updated 10 months ago
- This project provides an unsupervised framework for mining and tagging quality phrases on text corpora with pretrained language models (K…☆172Updated 2 years ago
- NAACL2021 - COIL Contextualized Lexical Retriever☆152Updated 3 years ago
- SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples☆75Updated 2 years ago
- pytorch implementation of the TwinBert paper☆40Updated 3 years ago
- This is the code for our KILT leaderboard submissions (KGI + Re2G models).☆153Updated last year
- Official code for achieving human parity on CommonsenseQA with External Attention☆109Updated last year
- SpanNER: Named EntityRe-/Recognition as Span Prediction☆127Updated 2 years ago
- The autoregressive information extraction system GenIE (Generative Information Extraction) implemented in PyTorch.☆100Updated last year
- Code repo for ACL22 paper "DeepStruct: Pretraining of Language Models for Structure Prediction"☆84Updated 2 years ago
- [EMNLP'21] Plan-then-Generate: Controlled Data-to-Text Generation via Planning☆76Updated 2 years ago
- SIGIR'21: Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track.☆130Updated 3 years ago
- ☆17Updated 3 years ago
- Multi^2OIE: Multilingual Open Information Extraction Based on Multi-Head Attention with BERT (Findings of ACL: EMNLP 2020)☆56Updated 2 years ago
- An easy-to-use tool for phrase encoding and topic mining (unsupervised aspect extraction); Code base for ACL 2022 paper, UCTopic: Unsuper…☆43Updated last year