A collection of small corpuses of interesting data for the creation of bots and similar stuff.
☆5,081Jan 19, 2026Updated last month
Alternatives and similar repositories for corpora
Users that are interested in corpora are comparing it to the libraries listed below
Sorting:
- Tracery: a story-grammar generation library for javascript☆2,182Nov 3, 2024Updated last year
- A simple Python interface for Darius Kazemi's Corpora Project.☆122Feb 7, 2020Updated 6 years ago
- Python port of Kate Compton's Tracery text expansion library.☆259Mar 8, 2024Updated last year
- National Novel Generation Month, 2015 edition.☆341Sep 30, 2023Updated 2 years ago
- National Novel Generation Month, 2016 edition.☆162Sep 30, 2023Updated 2 years ago
- A small module meant for use in text generators that lets you filter strings for bad words.☆226Jun 26, 2023Updated 2 years ago
- Notebooks and other materials for Reading and Writing Electronic Text☆206Jun 13, 2023Updated 2 years ago
- RiTa: the generative language toolkit☆353Dec 14, 2022Updated 3 years ago
- I have this big list of links to text stuff that I like, so I thought I'd make it into a repository.☆72Feb 28, 2018Updated 8 years ago
- modest natural-language processing☆12,040Feb 23, 2026Updated last week
- National Novel Generation Month, 2017 edition.☆185Sep 30, 2023Updated 2 years ago
- I wanted all of plaintext Project Gutenberg in an easy-to-use format, so I made this☆227Apr 27, 2023Updated 2 years ago
- National Novel Generation Month, 2018 edition.☆112Sep 30, 2023Updated 2 years ago
- RiTa: the generative language toolkit (in JS)☆268Dec 2, 2022Updated 3 years ago
- Creative Coding: Generative Art, Data visualization, Interaction Design, Resources.☆14,511Jun 16, 2025Updated 8 months ago
- The Big List of Naughty Strings is a list of strings which have a high probability of causing issues when used as user-input data.☆47,608Apr 18, 2024Updated last year
- ☆81Dec 29, 2021Updated 4 years ago
- National Novel Generation Month, 2014 edition.☆257Sep 30, 2023Updated 2 years ago
- A bare-bones simulation-driven narrative framework☆86Dec 2, 2018Updated 7 years ago
- Bitmap & tilemap generation from a single example with the help of ideas from quantum mechanics☆24,705Nov 25, 2025Updated 3 months ago
- A dataset containing story plots from Wikipedia (books, movies, etc.) and the code for the extractor.☆320Sep 26, 2017Updated 8 years ago
- Tracery: a story-grammar generation library for javascript☆129Nov 18, 2024Updated last year
- Experiments conducted for NaNoGenMo 2014☆26Mar 4, 2024Updated last year
- A topic-centric list of HQ open datasets.☆73,042Feb 17, 2026Updated last week
- An open source multi-tool for exploring and publishing data☆10,779Updated this week
- general natural language facilities for node☆10,871Feb 22, 2026Updated last week
- Material for a class at SFPC☆74Apr 3, 2016Updated 9 years ago
- National Novel Generation Month, 2019 edition.☆97Sep 30, 2023Updated 2 years ago
- A simple interface for the CMU pronouncing dictionary☆319Aug 13, 2024Updated last year
- Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.☆4,932Jul 17, 2022Updated 3 years ago
- Open source, experimental, and tiny tools roundup☆1,764Aug 13, 2024Updated last year
- A Web Audio framework for making interactive music in the browser.☆14,735Feb 23, 2026Updated last week
- a cheat-sheet for mathematical notation in code form☆15,464Mar 8, 2022Updated 3 years ago
- A linter for prose.☆4,512Updated this week
- An informal syllabus for 'The Fundamentals of Computing' workshop at ITP, 2016☆16Aug 16, 2022Updated 3 years ago
- ☆3,287Oct 14, 2018Updated 7 years ago
- An overview and exploration of the concept of missing datasets.☆496Jan 25, 2018Updated 8 years ago
- National Novel Generation Month, 2021 edition.☆44Sep 30, 2023Updated 2 years ago
- Lectures used in my pedagogy☆306Feb 14, 2026Updated 2 weeks ago