google-research-datasets / MAVE
The dataset contains 3 million attribute-value annotations across 1257 unique categories on 2.2 million cleaned Amazon product profiles. It is a large, multi-sourced, diverse dataset for product attribute extraction study.
☆138Updated last year
Related projects ⓘ
Alternatives and complementary repositories for MAVE
- ☆86Updated 4 years ago
- Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations☆133Updated 5 months ago
- ACL19-Scaling Up Open Tagging from Tens to Thousands☆16Updated 5 years ago
- WSDM'22 Best Paper: Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval☆118Updated 3 months ago
- MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, …☆123Updated 2 years ago
- NAACL2021 - COIL Contextualized Lexical Retriever☆149Updated 3 years ago
- A library to conduct ranking experiments with transformers.☆161Updated last year
- SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples☆74Updated 2 years ago
- This repository contains the code to reproduce the experiments of the poster "Supervised Contrastive Learning for Product Matching"☆36Updated 2 years ago
- Unofficial implementation of the paper "OpenTag: Open Attribute Value Extraction from Product Profiles"☆33Updated 6 years ago
- EMNLP 2021 - Pre-training architectures for dense retrieval☆243Updated 2 years ago
- [KDD 2020] Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding☆57Updated 3 years ago
- A Pytorch implementation of "Scaling Up Open Tagging from Tens to Thousands: Comprehension Empowered Attribute Value Extraction from Prod…☆60Updated 4 years ago
- code and data to faciliate BERT/ELECTRA for document ranking. Details refer to the paper - PARADE: Passage Representation Aggregation for…☆97Updated last year
- Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data☆100Updated last year
- RepBERT is a competitive first-stage retrieval technique. It represents documents and queries with fixed-length contextualized embeddings…☆66Updated 3 years ago
- Build Text Rerankers with Deep Language Models☆252Updated 9 months ago
- SIGIR'21: Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track.☆126Updated 2 years ago
- Snippext: Semi-supervised Opinion Mining with Augmented Data☆59Updated last year
- This project provides an unsupervised framework for mining and tagging quality phrases on text corpora with pretrained language models (K…☆168Updated last year
- Multi-stage passage ranking: monoBERT + duoBERT☆112Updated 4 years ago
- ☆57Updated last year
- Code repo for EMNLP21 paper "Zero-Shot Information Extraction as a Unified Text-to-Triple Translation"☆107Updated 6 months ago
- A sytem for Named Entity Disambiguation based on Random Walks and Learning to Rank.☆18Updated 2 years ago
- Code for paper OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision☆24Updated 2 years ago
- The source code used for paper "Empower Entity Set Expansion via Language Model Probing", published in ACL 2020.☆33Updated 4 years ago
- ☆140Updated 5 years ago
- KG-BART: Knowledge Graph-Augmented BART for GenerativeCommonsense Reasoning☆161Updated 2 years ago
- An end-to-end neural ad-hoc ranking pipeline.☆150Updated 7 months ago