daveshap / PlainTextWikipediaLinks
Convert Wikipedia database dumps into plaintext files
☆322Updated 4 years ago
Alternatives and similar repositories for PlainTextWikipedia
Users that are interested in PlainTextWikipedia are comparing it to the libraries listed below
Sorting:
- Nearly a thousand bash and python scripts I've written over the years.☆123Updated 7 months ago
- The subreddit archiver☆179Updated last year
- GPT Takes the Bar Exam☆142Updated 2 years ago
- A python utility for downloading Common Crawl data☆243Updated 2 years ago
- Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more☆221Updated last year
- Conversational text Analysis using various NLP techniques☆180Updated 2 years ago
- A Flask webapp & Python scripts for predicting reddit users' political leaning, using their comment history.☆64Updated 2 years ago
- An on-going dataset consisting of hashtags, n-gram counts and other misc NLP things for covid-19 analysis, stemming from over 100 000 000…☆57Updated 3 years ago
- Chat interface to gpt-j. Runs in Google Colab.☆58Updated 2 years ago
- A GPT-J API to use with python3 to generate text, blogs, code, and more☆203Updated 2 years ago
- Streaming WARC/ARC library for fast web archive IO☆428Updated 8 months ago
- Example scripts for the pushshift dump files☆388Updated 3 weeks ago
- Semantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engine☆242Updated 2 years ago
- A simple Python wrapper for the archive.is capturing service☆203Updated 6 months ago
- ☆81Updated 6 years ago
- A tool to automatically turn any Wikipedia article into a video☆57Updated 3 years ago
- Dump of generated texts from GPT-2 trained on Hacker News titles☆118Updated 6 years ago
- Cleaning tool for web scraped text☆38Updated 2 years ago
- Tag news stories based on models trained on the NYT corpus.☆42Updated 2 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 6 years ago
- A multithread Pushshift.io API Wrapper for reddit.com comment and submission searches.☆221Updated 2 years ago
- 📊 Semantic search for headlines and story text☆360Updated last year
- A Python scraper for Goodreads books and reviews.☆296Updated 6 months ago
- This AI Does Not Exist: generate realistic descriptions of made-up machine learning models.☆147Updated 3 years ago
- GPT-3 Explorer☆208Updated 5 years ago
- Offline Internet Archive project☆289Updated last year
- archive reddit data as offline friendly web pages☆176Updated 5 years ago
- Reddit takeout: export your account data as JSON: comments, submissions, upvotes etc. 🦖☆173Updated last month
- Labelling platform for text using weak supervision.☆264Updated 3 years ago
- ArchiveBot, an IRC bot for archiving websites☆395Updated 3 weeks ago