google-research-datasets/paws

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google-research-datasets/paws)

google-research-datasets / paws

This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, and word order information for the problem of paraphrase identification.

☆570

Alternatives and similar repositories for paws

Users that are interested in paws are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

miyyer / scpn
View on GitHub
syntactically controlled paraphrase networks
☆168Dec 30, 2018Updated 7 years ago
Tiiiger / bert_score
View on GitHub
BERT score for text generation
☆1,909Jul 30, 2024Updated last year
facebookresearch / MLQA
View on GitHub
New dataset
☆311Aug 31, 2021Updated 4 years ago
facebookresearch / XLM
View on GitHub
PyTorch original implementation of Cross-lingual Language Model Pretraining.
☆2,923Feb 14, 2023Updated 3 years ago
google-research-datasets / tydiqa
View on GitHub
TyDi QA contains 200k human-annotated question-answer pairs in 11 Typologically Diverse languages, written without seeing the answer and …
☆319May 28, 2020Updated 6 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
neulab / compare-mt
View on GitHub
A tool for holistic analysis of language generations systems
☆471Sep 22, 2025Updated 10 months ago
facebookresearch / LAMA
View on GitHub
LAnguage Model Analysis
☆1,391Jul 7, 2024Updated 2 years ago
simonepri / fever-transformers
View on GitHub
📄 Evidence Retrieval and Claim Verification for the FEVER shared task using Transformer Networks
☆12Feb 21, 2020Updated 6 years ago
nelson-liu / contextual-repr-analysis
View on GitHub
A toolkit for evaluating the linguistic knowledge and transferability of contextual representations. Code for "Linguistic Knowledge and T…
☆212Oct 20, 2021Updated 4 years ago
jwieting / para-nmt-50m
View on GitHub
Pre-trained models and code and data to train and use models from "Pushing the Limits of Paraphrastic Sentence Embeddings with Millions o…
☆105Dec 5, 2023Updated 2 years ago
google-research / xtreme
View on GitHub
XTREME is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models that covers 40 ty…
☆651Jan 4, 2023Updated 3 years ago
mrqa / MRQA-Shared-Task-2019
View on GitHub
Resources for the MRQA 2019 Shared Task
☆294Aug 5, 2021Updated 4 years ago
harvardnlp / pytorch-struct
View on GitHub
Fast, general, and tested differentiable structured prediction in PyTorch
☆1,133Apr 20, 2022Updated 4 years ago
google-research-datasets / natural-questions
View on GitHub
Natural Questions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators. NQ is design…
☆1,136Jul 30, 2021Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
uber-research / PPLM
View on GitHub
Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.
☆1,153Feb 20, 2024Updated 2 years ago
google-research / bleurt
View on GitHub
BLEURT is a metric for Natural Language Generation based on transfer learning.
☆794Aug 4, 2023Updated 2 years ago
wasiahmad / paraphrase_identification
View on GitHub
Examine two sentences and determine whether they have the same meaning.
☆225Feb 5, 2019Updated 7 years ago
facebookresearch / anli
View on GitHub
Adversarial Natural Language Inference Benchmark
☆402May 12, 2022Updated 4 years ago
namisan / mt-dnn
View on GitHub
Multi-Task Deep Neural Networks for Natural Language Understanding
☆2,259Mar 7, 2024Updated 2 years ago
seominjoon / denspi
View on GitHub
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index (DenSPI)
☆200Jul 6, 2023Updated 3 years ago
google-research / language
View on GitHub
Shared repository for open-sourced projects from the Google AI Language team.
☆1,788Jun 10, 2026Updated last month
google-research-datasets / wiki-split
View on GitHub
One million English sentences, each split into two sentences that together preserve the original meaning, extracted from Wikipedia edits.
☆125Jun 3, 2019Updated 7 years ago
google-research / text-to-text-transfer-transformer
View on GitHub
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
☆6,537Jul 8, 2026Updated 2 weeks ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
facebookresearch / LASER
View on GitHub
Language-Agnostic SEntence Representations
☆3,661May 2, 2024Updated 2 years ago
neulab / InterpretEval
View on GitHub
Interpretable Evaluation for (Almost) All NLP Tasks
☆194Sep 22, 2025Updated 10 months ago
zihangdai / xlnet
View on GitHub
XLNet: Generalized Autoregressive Pretraining for Language Understanding
☆6,182May 28, 2023Updated 3 years ago
FranxYao / dgm_latent_bow
View on GitHub
Implementation of NeurIPS 19 paper: Paraphrase Generation with Latent Bag of Words
☆122Oct 9, 2021Updated 4 years ago
google-research / lasertagger
View on GitHub
☆603Mar 12, 2026Updated 4 months ago
microsoft / fastseq
View on GitHub
An efficient implementation of the popular sequence models for text generation, summarization, and translation tasks. https://arxiv.org/p…
☆433Aug 17, 2022Updated 3 years ago
seominjoon / piqa
View on GitHub
Phrase-Indexed Question Answering (PIQA)
☆93Apr 27, 2019Updated 7 years ago
google-research-datasets / wiki-atomic-edits
View on GitHub
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contai…
☆105May 6, 2019Updated 7 years ago
Vamsi995 / Paraphrase-Generator
View on GitHub
A paraphrase generator built using the T5 model which produces paraphrased English sentences.
☆321Apr 18, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
facebookresearch / InferSent
View on GitHub
InferSent sentence embeddings
☆2,280Aug 30, 2021Updated 4 years ago
salesforce / ctrl
View on GitHub
Conditional Transformer Language Model for Controllable Generation
☆1,880May 1, 2025Updated last year
GuyTevet / diversity-eval
View on GitHub
Official Github repo for the paper "Evaluating the Evaluation of Diversity in Natural Language Generation"
☆21Feb 23, 2021Updated 5 years ago
facebookresearch / SentEval
View on GitHub
A python tool for evaluating the quality of sentence embeddings.
☆2,110Mar 19, 2024Updated 2 years ago
facebookresearch / XNLI
View on GitHub
Evaluating Cross-lingual Sentence Representations
☆463Aug 30, 2021Updated 4 years ago
google-research / electra
View on GitHub
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
☆2,367Mar 23, 2024Updated 2 years ago
google-deepmind / xquad
View on GitHub
☆211Nov 12, 2021Updated 4 years ago