google-research-datasets / MAVELinks
The dataset contains 3 million attribute-value annotations across 1257 unique categories on 2.2 million cleaned Amazon product profiles. It is a large, multi-sourced, diverse dataset for product attribute extraction study.
☆143Updated 2 years ago
Alternatives and similar repositories for MAVE
Users that are interested in MAVE are comparing it to the libraries listed below
Sorting:
- Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations☆132Updated 3 weeks ago
- ACL19-Scaling Up Open Tagging from Tens to Thousands☆16Updated 5 years ago
- RepBERT is a competitive first-stage retrieval technique. It represents documents and queries with fixed-length contextualized embeddings…☆66Updated 3 years ago
- code and data to faciliate BERT/ELECTRA for document ranking. Details refer to the paper - PARADE: Passage Representation Aggregation for…☆97Updated 2 years ago
- ☆87Updated 4 years ago
- WSDM'22 Best Paper: Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval☆120Updated 10 months ago
- A Pytorch implementation of "Scaling Up Open Tagging from Tens to Thousands: Comprehension Empowered Attribute Value Extraction from Prod…☆59Updated 5 years ago
- Code repo for EMNLP21 paper "Zero-Shot Information Extraction as a Unified Text-to-Triple Translation"☆108Updated last year
- MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, …☆126Updated 3 years ago
- source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.☆56Updated 4 years ago
- SIGIR 2021: Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling☆59Updated 3 years ago
- code of CycleGT☆87Updated 2 years ago
- Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation☆111Updated 3 years ago
- SIGIR'21: Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track.☆130Updated 3 years ago
- CIKM'21: JPQ substantially improves the efficiency of Dense Retrieval with 30x compression ratio, 10x CPU speedup and 2x GPU speedup.☆52Updated 3 years ago
- Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data☆100Updated last year
- SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples☆75Updated 2 years ago
- A library to conduct ranking experiments with transformers.☆160Updated last year
- Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.☆184Updated 2 years ago
- [NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining☆117Updated last year
- Code repo for ACL22 paper "DeepStruct: Pretraining of Language Models for Structure Prediction"☆84Updated 2 years ago
- Pytorch implementation of Highly Parallel Autoregressive Entity Linking with Discriminative Correction☆67Updated 3 years ago
- [KDD 2020] Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding☆57Updated 4 years ago
- Implementation of paper: HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer Reranking☆69Updated 2 years ago
- Submission archive for the MS MARCO document ranking leaderboard☆30Updated last year
- ☆24Updated last year
- EMNLP 2021 - Pre-training architectures for dense retrieval☆252Updated 3 years ago
- ☆68Updated last month
- An end-to-end neural ad-hoc ranking pipeline.☆151Updated 2 months ago
- An benchmark dataset for personalized product search built on Amazon review data☆49Updated 5 years ago