Pipeline to generate the Standardized Project Gutenberg Corpus
☆211Jan 5, 2024Updated 2 years ago
Alternatives and similar repositories for gutenberg
Users that are interested in gutenberg are comparing it to the libraries listed below
Sorting:
- Building a Large Language Model (From Scratch) to understand and create your own GPT-like large language models (LLMs) from the ground up…☆13May 8, 2025Updated 10 months ago
- Statistical methods for estimating scaling laws in urban data☆11Dec 9, 2024Updated last year
- AstorAI is a user-friendly medical chatbot powered by Retrieval-Augmented Generation (RAG) and the advanced LLama 3 model. It offers real…☆22Nov 9, 2024Updated last year
- a python package for cleaning Gutenberg books and dataset☆34May 2, 2025Updated 10 months ago
- A simple interface to the Project Gutenberg corpus.☆331Jan 12, 2023Updated 3 years ago
- A Comprehensive survey on business use cases of AI that help them thrive in the digital economy☆13Oct 7, 2020Updated 5 years ago
- ☆14Apr 1, 2025Updated 11 months ago
- I wanted all of plaintext Project Gutenberg in an easy-to-use format, so I made this☆228Apr 27, 2023Updated 2 years ago
- Syllabus for EDCT GE 2550☆16Oct 3, 2019Updated 6 years ago
- This repository is created as part of Sebastian's Raschka's workshop- Building LLMs Ground Up.☆16Sep 14, 2024Updated last year
- 从科学到科幻☆15Sep 25, 2015Updated 10 years ago
- Code and data for TACL paper It’s not Rocket Science: Interpreting Figurative Language in Narratives☆15Sep 4, 2023Updated 2 years ago
- ☆31Mar 14, 2017Updated 8 years ago
- Create a source of truth for ML model results and browse it on Papers with Code☆34Jun 9, 2021Updated 4 years ago
- Repository for code and metadata to support work described in "Authorless Topic Models: Biasing Models Away from Known Structure"☆29May 13, 2020Updated 5 years ago
- ☆17Aug 28, 2025Updated 6 months ago
- ☆19Jun 5, 2023Updated 2 years ago
- Course information for COP-3402 Systems Software Spring 2019 at UCF☆37Apr 23, 2019Updated 6 years ago
- Metadata from Project Gutenberg☆41Jan 5, 2026Updated 2 months ago
- Datasets and functions for the Handbook of Educational Measurement and Psychometrics using R.☆24Apr 2, 2021Updated 4 years ago
- An experimental desktop client for using Claude Desktop's MCP with Novelcrafter codices.☆10Dec 3, 2024Updated last year
- A corpus of poetry from Project Gutenberg☆213Aug 13, 2018Updated 7 years ago
- The GitBook documentation site for OpenAlex☆27Jan 18, 2026Updated last month
- An instruction tuned large language model with extra support for poetry and verse generation☆25Jun 5, 2023Updated 2 years ago
- 🔗 A curated list of awesome url shortener☆22Jan 22, 2024Updated 2 years ago
- Code and data supporting "NovelTM Data Sets for English-Language Fiction."☆26Dec 22, 2020Updated 5 years ago
- A benchmark corpus of 100 English novels, covering the 19th and the beginning of the 20th century☆24Aug 10, 2022Updated 3 years ago
- ☆23Apr 25, 2023Updated 2 years ago
- SORTED: A curated collection of interesting ideas, tools, and resources in neuroscience, data management, and data science, all in the sp…☆27Aug 10, 2025Updated 6 months ago
- Computational Methods in Psychology and Neuroscience☆33Dec 8, 2023Updated 2 years ago
- Code for our ACL'23 paper on how to identify metaphor mappings with the help of GPT-3☆11May 21, 2025Updated 9 months ago
- Documentation for Bookworm: particularly focusing on creation aspects -☆10Aug 26, 2016Updated 9 years ago
- An Easy Annotation Tool for Natural Language Processing☆11May 17, 2024Updated last year
- Suite of generic Linked Data/SPARQL as well as LinkedDataHub-specific MCP tools☆38Feb 23, 2026Updated 2 weeks ago
- Archive of CQL's predecessor, by Patrick Schultz, David Spivak, and Ryan Wisnesky☆13Aug 22, 2019Updated 6 years ago
- Annotations and code for the EMNLP 2018 paper 'Weeding out Conventionalized Metaphors: A Corpus of Novel Metaphor Annotations'☆10Feb 20, 2023Updated 3 years ago
- A Python Twitter bot posting recently active questions from Stack Overflow. Tweaked to run on AWS Lambda.☆10Jan 14, 2020Updated 6 years ago
- Course Materials for Bayesian Psychometric Modeling☆15May 14, 2019Updated 6 years ago
- 本项目主要对开源的MOSS SFT数据进行整理 ,转换成mnbvc多轮对话格式。MOSS-003涵盖用性、忠实性、无害性三个层面,共353w样本,MOSS-003 包含更细粒度的有用性类别标记、更广泛的无害性数据和更长对话轮数,共630w样本,☆12Dec 3, 2023Updated 2 years ago