google-research-datasets / MAVE
The dataset contains 3 million attribute-value annotations across 1257 unique categories on 2.2 million cleaned Amazon product profiles. It is a large, multi-sourced, diverse dataset for product attribute extraction study.
☆139Updated 2 years ago
Alternatives and similar repositories for MAVE:
Users that are interested in MAVE are comparing it to the libraries listed below
- Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations☆132Updated 8 months ago
- MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, …☆124Updated 3 years ago
- ☆86Updated 4 years ago
- ACL19-Scaling Up Open Tagging from Tens to Thousands☆17Updated 5 years ago
- A library to conduct ranking experiments with transformers.☆161Updated last year
- RepBERT is a competitive first-stage retrieval technique. It represents documents and queries with fixed-length contextualized embeddings…☆66Updated 3 years ago
- Implementation of paper: HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer Reranking☆67Updated 2 years ago
- CIKM'21: JPQ substantially improves the efficiency of Dense Retrieval with 30x compression ratio, 10x CPU speedup and 2x GPU speedup.☆52Updated 3 years ago
- Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation☆109Updated 3 years ago
- SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples☆75Updated 2 years ago
- NAACL2021 - COIL Contextualized Lexical Retriever☆152Updated 3 years ago
- SIGIR 2021: Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling☆59Updated 3 years ago
- EMNLP 2021 - Pre-training architectures for dense retrieval☆244Updated 2 years ago
- WSDM'22 Best Paper: Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval☆120Updated 6 months ago
- SIGIR'21: Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track.☆131Updated 3 years ago
- code and data to faciliate BERT/ELECTRA for document ranking. Details refer to the paper - PARADE: Passage Representation Aggregation for…☆97Updated last year
- ☆162Updated 4 years ago
- This project provides an unsupervised framework for mining and tagging quality phrases on text corpora with pretrained language models (K…☆171Updated 2 years ago
- Code for CEDR: Contextualized Embeddings for Document Ranking, accepted at SIGIR 2019.☆154Updated 4 years ago
- A multilingual version of MS MARCO passage ranking dataset☆143Updated last year
- Training & evaluation library for text-based neural re-ranking and dense retrieval models built with PyTorch☆262Updated 2 years ago
- [NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining☆118Updated last year
- Dataset for NAACL 2021 paper: "DART: Open-Domain Structured Data Record to Text Generation"☆151Updated 2 years ago
- Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data☆100Updated last year
- ☆82Updated last year
- ☆66Updated 3 years ago
- Snippext: Semi-supervised Opinion Mining with Augmented Data☆58Updated last year
- Multi-stage passage ranking: monoBERT + duoBERT☆112Updated 4 years ago
- docTTTTTquery document expansion model☆361Updated last year
- Unified Learned Sparse Retrieval Framework☆63Updated 9 months ago