This repository contains code to replicate the no-longer publicly available Toronto BookCorpus dataset
☆49Apr 6, 2022Updated 3 years ago
Alternatives and similar repositories for Replicate-Toronto-BookCorpus
Users that are interested in Replicate-Toronto-BookCorpus are comparing it to the libraries listed below
Sorting:
- Crawl BookCorpus☆853Jul 14, 2023Updated 2 years ago
- An Online Latent Dirichlet Allocation with Infinite Vocabulary implementation in Python.☆12Oct 4, 2018Updated 7 years ago
- The Tweets2013 Internet Archive collection☆10Aug 7, 2020Updated 5 years ago
- Author Profiling for Abuse Detection (COLING 2018)☆10Dec 8, 2022Updated 3 years ago
- ☆38Mar 10, 2016Updated 9 years ago
- Incremental Learning the Hierarchical Softmax Function for Neural Language Models☆11Dec 6, 2016Updated 9 years ago
- A library for creating complex experimental pipelines☆12Jul 25, 2022Updated 3 years ago
- Collection of Edge AI tutorials☆12Feb 15, 2020Updated 6 years ago
- Show the time in Roman Numerals☆11Jan 23, 2020Updated 6 years ago
- Concept2vec Metrics for Evaluating Quality of Embeddings for Ontological Concepts☆15Oct 22, 2018Updated 7 years ago
- This repository contains the WordNet Language Model Probing (WNLaMPro) dataset introduced in "Rare Words: A Major Problem for Contextuali…☆14Feb 2, 2020Updated 6 years ago
- One million English sentences, each split into two sentences that together preserve the original meaning, extracted from Wikipedia edits.☆124Jun 3, 2019Updated 6 years ago
- Survey on machine learning.☆14Nov 28, 2020Updated 5 years ago
- python interface for mate tools☆17Jan 23, 2018Updated 8 years ago
- GluonNLP tutorial for Pycon2019☆14Aug 16, 2019Updated 6 years ago
- Induce word representations using random indexing (RI)☆29Jun 17, 2010Updated 15 years ago
- A tool for calculation semantic similarity between words from a text corpus based on lexico-syntactic patterns.☆27Feb 13, 2016Updated 10 years ago
- A lean, mean, very quickly deployable ExternalQuestion template for Amazon Mechanical Turk. Simplified as a static page.☆16Mar 24, 2017Updated 8 years ago
- This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search …☆21Apr 30, 2021Updated 4 years ago
- Code for "Goodtriever: Toxicity Mitigation with Retrieval-augmented Language Models"☆25May 30, 2024Updated last year
- Event Detection With CLustering of Wavelet-based Signals (EDCoW) - Based on the paper 'Event Detection in Twitter' by Jianshu Weng, Bu-S…☆16Jun 24, 2014Updated 11 years ago
- Toolkit to compile a comparable/parallel corpus from European Parliament proceedings☆16Jan 26, 2020Updated 6 years ago
- Official Pytorch implementation of (Roles and Utilization of Attention Heads in Transformer-based Neural Language Models), ACL 2020☆16Mar 21, 2025Updated 11 months ago
- ML implementations for practical use☆15Apr 30, 2020Updated 5 years ago
- Supporting example for "A Rust SentencePiece implementation"☆20Jun 7, 2020Updated 5 years ago
- A tool for evaluation of semantic similarity measures.☆22Feb 3, 2013Updated 13 years ago
- Large scale sentential paraphrases collection and annotation☆46Dec 31, 2022Updated 3 years ago
- ☆20Jun 26, 2017Updated 8 years ago
- Sparse Beta-Divergence Tensor Factorization Library☆48Jun 2, 2025Updated 9 months ago
- ☆48Jun 8, 2020Updated 5 years ago
- Sub-Character Representation Learning☆25May 28, 2018Updated 7 years ago
- More than Just Words: Modeling Non-textual Characteristics of Podcasts☆26Nov 6, 2019Updated 6 years ago
- LM Pretraining with PyTorch/TPU☆137Oct 24, 2019Updated 6 years ago
- Fairness, Ethics, Explainability in AI and ML☆22Apr 18, 2020Updated 5 years ago
- A lightweight but powerful library to build token indices for NLP tasks, compatible with major Deep Learning frameworks like PyTorch and …☆51Dec 6, 2024Updated last year
- ☆54Dec 11, 2021Updated 4 years ago
- Train transformer-based models.☆28Jan 23, 2026Updated last month
- This repository contains the Arabic sarcasm dataset (ArSarcasm)☆26Feb 18, 2021Updated 5 years ago
- Repository for the paper "Optimal Subarchitecture Extraction for BERT"☆470Jun 22, 2022Updated 3 years ago