google-research-datasets/MAVE

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google-research-datasets/MAVE)

google-research-datasets / MAVE

The dataset contains 3 million attribute-value annotations across 1257 unique categories on 2.2 million cleaned Amazon product profiles. It is a large, multi-sourced, diverse dataset for product attribute extraction study.

☆158

Alternatives and similar repositories for MAVE

Users that are interested in MAVE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

cubenlp / ACL19_Scaling_Up_Open_Tagging
View on GitHub
ACL19-Scaling Up Open Tagging from Tens to Thousands
☆17Aug 23, 2019Updated 6 years ago
jd-aig / JAVE
View on GitHub
☆88Sep 15, 2020Updated 5 years ago
lumiqai / UOI-1806.01264
View on GitHub
Unofficial implementation of the paper "OpenTag: Open Attribute Value Extraction from Product Profiles"
☆33Aug 22, 2018Updated 7 years ago
wbsg-uni-mannheim / ExtractGPT
View on GitHub
Attribute Value Extraction using Large Language Models
☆29May 24, 2024Updated 2 years ago
xinyangz / OAMine
View on GitHub
Code for paper OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision
☆30May 9, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
HKUST-KnowComp / FolkScope
View on GitHub
[ACL 2023] Codes and Datasets for Paper: FolkScope: Intention Knowledge Graph Construction for Discovering E-commerce Commonsense
☆42Mar 3, 2025Updated last year
IBPA / LOVE
View on GitHub
Learning Ontologies Via Embeddings
☆12Jul 6, 2023Updated 3 years ago
j-r77 / cfddiscovery
View on GitHub
☆11Oct 31, 2019Updated 6 years ago
ujiuji1259 / shinra-attribute-extraction
View on GitHub
☆11Sep 7, 2021Updated 4 years ago
cdlockard / expanded_swde
View on GitHub
☆14Apr 18, 2020Updated 6 years ago
xu-song / k-plug
View on GitHub
K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce (Findings of EMNLP …
☆31Jan 6, 2023Updated 3 years ago
Alibaba-NLP / AIN
View on GitHub
Code for our EMNLP 2020 Paper "AIN: Fast and Accurate Sequence Labeling with Approximate Inference Network"
☆19Nov 14, 2022Updated 3 years ago
Akeepers / LEAR
View on GitHub
The implementation our EMNLP 2021 paper "Enhanced Language Representation with Label Knowledge for Span Extraction".
☆115May 22, 2023Updated 3 years ago
SumitVermakgp / NLP-Attribute-Extraction-Flipkart
View on GitHub
Large online shopping companies need to automatically populate their product descriptions supplied by the sellers. Many a times the text …
☆11Jul 4, 2018Updated 8 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
wavewangyue / mae
View on GitHub
基于多模态的属性抽取
☆46Aug 6, 2020Updated 5 years ago
chongzhangFDU / Token-Path-Prediction-Datasets
View on GitHub
This is the official repository of the revised datasets FUNSD-r and CORD-r, introduced in EMNLP 2023 paper Reading Order Matters: Informa…
☆17Mar 20, 2024Updated 2 years ago
sjcfr / ege-RoBERTa
View on GitHub
☆13Sep 5, 2021Updated 4 years ago
Alibaba-NLP / MultilangStructureKD
View on GitHub
[ACL 2020] Structure-Level Knowledge Distillation For Multilingual Sequence Labeling
☆73Nov 23, 2022Updated 3 years ago
snover / terp
View on GitHub
TER-plus Machine Translation metric.
☆31May 23, 2022Updated 4 years ago
ddenron / deco_dataset
View on GitHub
This repository holds the annotated spreadsheet files, comprising the DECO dataset.
☆13Mar 21, 2019Updated 7 years ago
zuohuif / COOKIE
View on GitHub
A Dataset for Conversational Recommendation over KnowledgeGraph in E-commerce
☆51Sep 26, 2021Updated 4 years ago
megagonlabs / rotom
View on GitHub
Code for the paper "Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond…
☆24May 31, 2022Updated 4 years ago
Wang-Shuo / SIGIR2020Challenge
View on GitHub
The 1st place solution for SIGIR 2020 E-Commerce Workshop Multimodal Product Classification Challenge
☆21Aug 3, 2020Updated 5 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
SalesforceAIResearch / xRouter
View on GitHub
xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning
☆31Jun 2, 2026Updated last month
JD-AI-Research-Silicon-Valley / HDEGraph
View on GitHub
Code for ACL 2019 paper "Multi-hop Reading Comprehension across Multiple Documents by Reasoning over Heterogeneous Graphs"
☆18Feb 9, 2020Updated 6 years ago
drndr / multilabel-text-clf
View on GitHub
☆18Mar 3, 2023Updated 3 years ago
netpaladinx / DPMPN
View on GitHub
☆21Mar 25, 2023Updated 3 years ago
karmaresearch / takco
View on GitHub
🌮 Table-based KB Completer
☆16Mar 13, 2024Updated 2 years ago
jd-aig / multimodal-product-summarization-challenge
View on GitHub
☆23May 25, 2022Updated 4 years ago
dominikandreas / CSP
View on GitHub
High-level Semantic Feature Detection: A New Perspective for Pedestrian Detection, CVPR, 2019
☆13Aug 20, 2019Updated 6 years ago
pgcool / textTOvec
View on GitHub
ICLR 2019 paper: "textTOvec: DEEP CONTEXTUALIZED NEURAL AUTOREGRESSIVE TOPIC MODELS OF LANGUAGE WITH DISTRIBUTED COMPOSITIONAL PRIOR"
☆25Dec 30, 2018Updated 7 years ago
anthonywchen / AmbER-Sets
View on GitHub
The official repository for "Evaluating Entity Disambiguation and the Role of Popularity in Retrieval-Based NLP" published in ACL-IJNLP 2…
☆20Apr 22, 2022Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ZihanWangKi / CrossWeigh
View on GitHub
CrossWeigh: Training Named Entity Tagger from Imperfect Annotations
☆177Jul 25, 2024Updated 2 years ago
nguyenvo09 / CombatingFakeNews
View on GitHub
This is the repository of code and dataset for paper "The Rise of Guardians: Fact-checking URL Recommendation to Combat Fake News", SIGIR…
☆18Feb 19, 2022Updated 4 years ago
google-research-datasets / WebRED
View on GitHub
WebRED is a large and diverse manually annotated dataset for extracting relationships from a variety of text found on the World Wide Web.
☆22Mar 11, 2021Updated 5 years ago
NPoe / ebert
View on GitHub
☆37Sep 22, 2021Updated 4 years ago
jzbjyb / oie_rank
View on GitHub
Iterative Rank-Aware Open IE
☆30Jun 24, 2019Updated 7 years ago
penn-nlp / mmid
View on GitHub
Words and their images in 98 languages
☆14Mar 1, 2019Updated 7 years ago
korenyoni / opus-api
View on GitHub
OPUS (opus.nlpl.eu) Python3 API
☆18Nov 23, 2024Updated last year