daveshap / PlainTextWikipediaLinks
Convert Wikipedia database dumps into plaintext files
☆319Updated 4 years ago
Alternatives and similar repositories for PlainTextWikipedia
Users that are interested in PlainTextWikipedia are comparing it to the libraries listed below
Sorting:
- Nearly a thousand bash and python scripts I've written over the years.☆122Updated 4 months ago
- Dolores is a Python library designed to improve the developer experience when working with pretrained language models. Dolores provides p…☆34Updated 4 years ago
- Python code for building a GPT-3 based technical blog post optimizer.☆84Updated 2 years ago
- Download subreddit comments☆94Updated 3 years ago
- Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more☆218Updated last year
- The subreddit archiver☆177Updated last year
- A Flask webapp & Python scripts for predicting reddit users' political leaning, using their comment history.☆64Updated last year
- A python utility for downloading Common Crawl data☆240Updated last year
- Unreliable News Index (for Columbia Journalism Review)☆55Updated 3 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Self-hosted GPT playground☆113Updated 9 months ago
- A tool to automatically turn any Wikipedia article into a video☆56Updated 2 years ago
- 📊 Semantic search for headlines and story text☆360Updated last year
- Index Common Crawl archives in tabular format☆120Updated 3 weeks ago
- A simple Python wrapper for the archive.is capturing service☆202Updated 3 months ago
- Semantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engine☆242Updated 2 years ago
- A universal package of scraper scripts for humans☆310Updated 3 years ago
- Python script to download public Tweets from a given Twitter account into a format suitable for AI text generation.☆224Updated 5 years ago
- Pre-test Hacker News Show HN post subtitles with machine learning algorithm☆20Updated 4 years ago
- Text generator prompting with Boolean operators☆180Updated 2 years ago
- A Reddit front-page reader in the style of The New York Times.☆249Updated 3 years ago
- experiment to generate novel-length fiction from a single story premise☆28Updated 3 years ago
- An on-going dataset consisting of hashtags, n-gram counts and other misc NLP things for covid-19 analysis, stemming from over 100 000 000…☆57Updated 3 years ago
- ☆44Updated 4 years ago
- This AI Does Not Exist: generate realistic descriptions of made-up machine learning models.☆147Updated 3 years ago
- A Reddit bot that generates new context-aware comments using Markov chains trained from a set of given users or subreddits comments histo…☆73Updated 3 years ago
- Fine tune GPT-2 with your favourite authors☆71Updated last year
- Code for the paper "Language Models are Unsupervised Multitask Learners"☆108Updated 3 years ago
- A set of tools for automatically managing bitrot and format in large quantities of media☆92Updated 3 years ago
- Find legal citations in any block of text☆153Updated last week