nelson-liu / flatten_gigawordView external linksLinks
Dump the text of the Gigaword dataset into a single file, for use with language modeling (and other!) toolkits
☆23Sep 23, 2017Updated 8 years ago
Alternatives and similar repositories for flatten_gigaword
Users that are interested in flatten_gigaword are comparing it to the libraries listed below
Sorting:
- Classification Benchmarks for Under-resourced Bengali Language based on Multichannel Convolutional-LSTM Network☆20Jul 26, 2021Updated 4 years ago
- Zero-shot Transfer Learning from English to Arabic☆30Jun 22, 2022Updated 3 years ago
- jiant-dev☆28Dec 17, 2020Updated 5 years ago
- ☆10Feb 2, 2026Updated 2 weeks ago
- Bengali NLP☆32Mar 6, 2019Updated 6 years ago
- ☆11Apr 24, 2024Updated last year
- The repository for the paper "Predicting in-hospital mortality by combining clinical notes with time-series data"☆12May 23, 2021Updated 4 years ago
- ☆12Feb 22, 2021Updated 4 years ago
- Guide for the slp group on how to use the Grnet cluster☆11Apr 16, 2020Updated 5 years ago
- 📄🕸️ Generalizing Cross-Document Event Coreference Resolution Across Multiple Corpora☆10May 25, 2022Updated 3 years ago
- A web app for sharing, editing, and commenting on kifus (game records for the board game Go)☆10Jan 22, 2019Updated 7 years ago
- BanglaWriting: A multi-purpose offline Bangla handwriting dataset☆12Nov 18, 2020Updated 5 years ago
- ☆13Jul 8, 2020Updated 5 years ago
- Applied Data Science training course (for updates and resources, read the ReadMe file below)☆15Sep 9, 2023Updated 2 years ago
- This repo contains the code and results for reproducing the results in the paper: A SIMPLE BUT TOUGH-TO-BEAT BASELINE FOR SENTENCE EMBEDD…☆12Jul 13, 2018Updated 7 years ago
- ☆12Oct 1, 2025Updated 4 months ago
- ☆10Oct 2, 2017Updated 8 years ago
- A library of speech gadgets.☆14Oct 15, 2022Updated 3 years ago
- Word-level language identification for Bangla-English code-mixed social media data, using a BiLSTM with subword embeddings.☆10Aug 13, 2023Updated 2 years ago
- Yangon Township GeoJSON Data☆11Jun 10, 2015Updated 10 years ago
- Using YouTube to prepare a speech recognition dataset for any language☆10Mar 30, 2021Updated 4 years ago
- Complete set of English dialect transformation rules and evaluation code☆16Jun 7, 2024Updated last year
- ☆11Jul 12, 2021Updated 4 years ago
- This repo contains the baseline model recipes and pre-trained model for GramVanni hindi ASR challenge☆15Mar 26, 2022Updated 3 years ago
- Example code for my SunshinePHP Guzzle Tutorial☆10Feb 5, 2015Updated 11 years ago
- ☆11May 16, 2016Updated 9 years ago
- Java Bindings for the C++ library DeepSpeech☆10Jun 4, 2020Updated 5 years ago
- Tools for robustness evaluation in interpretability methods☆11Jun 25, 2021Updated 4 years ago
- Read audio with FFmpeg into NumPy/PyTorch via ctypes (standard library module)☆11Aug 12, 2020Updated 5 years ago
- Rabbit in Python☆11Mar 20, 2018Updated 7 years ago
- Unicode Blocks of a Ruby String☆19Sep 9, 2025Updated 5 months ago
- LockManager with deadlock detection for implementing 2PL☆13Mar 13, 2019Updated 6 years ago
- Voice activity detection (VAD) library and Go bindings based on WebRTC's VAD engine☆11Mar 1, 2018Updated 7 years ago
- Open Source Crimean Tatar Text-to-Speech datasets☆14Feb 23, 2025Updated 11 months ago
- Benchmarking gene embeddings on single, paired, and gene set tasks☆18Nov 29, 2025Updated 2 months ago
- A model implementation of sessions for koa using postgres as the backend☆10Oct 16, 2017Updated 8 years ago
- Customized Claude Code system prompts for use with tweakcc — ~48k bytes smaller, 30% faster, same accuracy☆33Nov 23, 2025Updated 2 months ago
- 👄🇧🇷 Alinhamento fonético forçado em Português Brasileiro☆12Jul 18, 2025Updated 6 months ago
- Code necessary to reproduce experiments in "FloraBERT: cross-species transfer learning with attention-based neural networks for gene expr…☆13Jul 6, 2022Updated 3 years ago