google-research-datasets / MAVELinks
The dataset contains 3 million attribute-value annotations across 1257 unique categories on 2.2 million cleaned Amazon product profiles. It is a large, multi-sourced, diverse dataset for product attribute extraction study.
☆148Updated 2 years ago
Alternatives and similar repositories for MAVE
Users that are interested in MAVE are comparing it to the libraries listed below
Sorting:
- Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations☆134Updated last month
- Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation☆113Updated 4 years ago
- RepBERT is a competitive first-stage retrieval technique. It represents documents and queries with fixed-length contextualized embeddings…☆66Updated 3 years ago
- Training & evaluation library for text-based neural re-ranking and dense retrieval models built with PyTorch☆264Updated 2 years ago
- MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, …☆128Updated 3 years ago
- NAACL2021 - COIL Contextualized Lexical Retriever☆154Updated 4 years ago
- SIGIR 2021: Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling☆59Updated 4 years ago
- Implementation of paper: HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer Reranking☆72Updated 2 years ago
- This project provides an unsupervised framework for mining and tagging quality phrases on text corpora with pretrained language models (K…☆173Updated 2 years ago
- EMNLP 2021 - Pre-training architectures for dense retrieval☆253Updated 3 years ago
- A library to conduct ranking experiments with transformers.☆160Updated 2 years ago
- Build Text Rerankers with Deep Language Models☆263Updated last year
- SIGIR'21: Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track.☆130Updated 3 years ago
- docTTTTTquery document expansion model☆369Updated 2 years ago
- An easy-to-use tool for phrase encoding and topic mining (unsupervised aspect extraction); Code base for ACL 2022 paper, UCTopic: Unsuper…☆47Updated 2 years ago
- An end-to-end neural ad-hoc ranking pipeline.☆151Updated 2 months ago
- WSDM'22 Best Paper: Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval☆120Updated last year
- code and data to faciliate BERT/ELECTRA for document ranking. Details refer to the paper - PARADE: Passage Representation Aggregation for…☆97Updated 2 years ago
- A curated list of awesome papers for Semantic Retrieval (TOIS Accepted: Semantic Models for the First-stage Retrieval: A Comprehensive Re…☆328Updated 2 years ago
- [KDD 2020] Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding☆58Updated 4 years ago
- source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.☆57Updated 4 years ago
- The autoregressive information extraction system GenIE (Generative Information Extraction) implemented in PyTorch.☆104Updated 2 years ago
- A multilingual version of MS MARCO passage ranking dataset☆145Updated last year
- ☆88Updated 5 years ago
- Binary Passage Retriever (BPR) - an efficient passage retriever for open-domain question answering☆172Updated 4 years ago
- A novel embedding training algorithm leveraging ANN search and achieved SOTA retrieval on Trec DL 2019 and OpenQA benchmarks☆377Updated 2 years ago
- 🦮 Code and pretrained models for Findings of ACL 2022 paper "LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrie…☆49Updated 3 years ago
- CIKM'21: JPQ substantially improves the efficiency of Dense Retrieval with 30x compression ratio, 10x CPU speedup and 2x GPU speedup.☆53Updated 3 years ago
- ☆162Updated 5 years ago
- Code repo for EMNLP21 paper "Zero-Shot Information Extraction as a Unified Text-to-Triple Translation"☆108Updated last year