jackbandy/bookcorpus-datasheet

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jackbandy/bookcorpus-datasheet)

jackbandy / bookcorpus-datasheet

Documentation effort for the BookCorpus dataset

☆34

Alternatives and similar repositories for bookcorpus-datasheet

Users that are interested in bookcorpus-datasheet are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

bicici / FDA
View on GitHub
Feature Decay Algorithms
☆11Mar 5, 2014Updated 12 years ago
sunyt32 / torchscale
View on GitHub
Transformers at any scale
☆42Jan 18, 2024Updated 2 years ago
Unbabel / BConTrasT
View on GitHub
☆20Aug 17, 2021Updated 4 years ago
HHW-zhou / TSMMG
View on GitHub
Code of "Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model"
☆13Jul 8, 2025Updated last year
sanyalsunny111 / Early_Weight_Avg
View on GitHub
[COLM 2024] Early Weight Averaging meets High Learning Rates for LLM Pre-training
☆19Oct 12, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
microsoft / RTP-LX
View on GitHub
Repository for the paper "RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?"
☆29May 1, 2025Updated last year
nikikilbertus / fairensics
View on GitHub
A python library to discover and mitigate biases in machine learning models and datasets
☆20Jul 6, 2023Updated 3 years ago
tangqianfeng / BlueToothProject
View on GitHub
汽车-androidAPP-物联网-蓝牙
☆11Nov 29, 2017Updated 8 years ago
half-potato / DCNv2
View on GitHub
Deformable Convolutional Networks v2 with Pytorch
☆10Jul 29, 2020Updated 6 years ago
longyuewangdcu / Cross-Sentence-NMT
View on GitHub
Cross Sentence Neural Machine Translation
☆10Mar 26, 2018Updated 8 years ago
atcbosselut / scs-baselines
View on GitHub
Baseline models for the paper: "Modeling Naive Psychology of Characters in Simple Commonsense Stories" by Hannah Rashkin, Antoine Bosselu…
☆16Feb 23, 2021Updated 5 years ago
shenao-zhang / reward-augmented-preference
View on GitHub
The official implementation of Preference Data Reward-Augmentation.
☆18May 1, 2025Updated last year
violet-zct / fairseq-dro-mnmt
View on GitHub
☆14Sep 10, 2021Updated 4 years ago
McGill-NLP / latent-translation
View on GitHub
Code for the paper "Modelling Latent Translations for Cross-Lingual Transfer"
☆17Nov 22, 2021Updated 4 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
kaushal0494 / ZmBART
View on GitHub
☆11Mar 19, 2023Updated 3 years ago
HITsz-TMG / VisionGraph
View on GitHub
The benchmark and datasets of the ICML 2024 paper "VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual C…
☆17May 27, 2024Updated 2 years ago
yudiandoris / csi
View on GitHub
End-to-End Chinese Speaker Identification
☆11Nov 17, 2022Updated 3 years ago
adijo / gpt3-alchemy
View on GitHub
GPT-3 attempts to predict & balance chemical reactions
☆13Aug 2, 2020Updated 5 years ago
qhungngo / EVBCorpus
View on GitHub
The English-Vietnamese Bilingual Corpus (EVBCorpus) is a collection of English and Vietnamese parallel translations and bitexts.
☆52Jul 12, 2019Updated 7 years ago
langtech-bsc / mt-evaluation
View on GitHub
A framework for evaluating Machine Translation models.
☆13Apr 21, 2026Updated 3 months ago
pangjh3 / AnLLM
View on GitHub
☆20Jun 17, 2024Updated 2 years ago
afiaka87 / dalle-pytorch-datasets
View on GitHub
☆12Jun 14, 2021Updated 5 years ago
leolle / atec_nlp
View on GitHub
蚂蚁金融自然语言处理竞赛。
☆10Sep 3, 2018Updated 7 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
Whispard / Quick-Ref
View on GitHub
Chrome Extension to get references from your local files
☆10Apr 15, 2023Updated 3 years ago
AI4Bharat / webcorpus
View on GitHub
Generate large textual corpora for almost any language by crawling the web
☆13Feb 17, 2024Updated 2 years ago
ArthurConmy / MishformerLens
View on GitHub
MishformerLens intends to be a drop-in replacement for TransformerLens that AST patches HuggingFace Transformers rather than implementing…
☆10Oct 7, 2024Updated last year
llyx97 / Rosita
View on GitHub
[AAAI 2021] "ROSITA: Refined BERT cOmpreSsion with InTegrAted techniques", Yuanxin Liu, Zheng Lin, Fengcheng Yuan
☆14Oct 18, 2022Updated 3 years ago
yeshaokai / Calibrator-Domain-Adaptation
View on GitHub
Release code for light-weight calibrator: a separable component for unsupervised domain adaptation
☆13Jul 17, 2021Updated 5 years ago
kpu / MEMT
View on GitHub
System Combination
☆16Aug 28, 2015Updated 10 years ago
simple-stories / simple_stories_train
View on GitHub
Trains small LMs. Designed for training on SimpleStories
☆14Sep 15, 2025Updated 10 months ago
johnmyleswhite / StatsFunctionsNotes
View on GitHub
Jupyter notebooks showing to implement statistical functions.
☆14Jun 14, 2020Updated 6 years ago
SunbowLiu / PTvsBT
View on GitHub
On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation (Findings of EMNLP 2021))
☆13Nov 21, 2021Updated 4 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Emotional-Text-to-Speech / pytorch-dc-tts
View on GitHub
Text to Speech with PyTorch (English and Mongolian)
☆13May 3, 2020Updated 6 years ago
Noahs-ARK / rational-recurrences
View on GitHub
Implementation for "Rational Recurrences", Peng et al., EMNLP 2018.
☆28Jun 21, 2022Updated 4 years ago
N1ghtF1re / Subprograms-table-generator
View on GitHub
Will help you with writing a report!
☆10Mar 10, 2018Updated 8 years ago
allenxz / facial-expression-recognition
View on GitHub
模式识别期末项目-基于Keras的人物面部表情识别
☆11Jun 25, 2019Updated 7 years ago
masakhane-io / masakhane-reading-group
View on GitHub
Agile reading group that works
☆13Feb 2, 2022Updated 4 years ago
facebookresearch / evaluation-of-nmt-bt
View on GitHub
This repository contains additional reference translations for the WMT'14 En-De (newstest2014) and WMT'19 En-Ru (newstest2019) test sets …
☆15Aug 31, 2021Updated 4 years ago
MaxyLee / 3AM
View on GitHub
Official code and data of "3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset"
☆12Dec 8, 2024Updated last year