daveshap / PlainTextWikipediaLinks
Convert Wikipedia database dumps into plaintext files
☆322Updated 4 years ago
Alternatives and similar repositories for PlainTextWikipedia
Users that are interested in PlainTextWikipedia are comparing it to the libraries listed below
Sorting:
- Download subreddit comments☆95Updated 3 years ago
- Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more☆220Updated last year
- Nearly a thousand bash and python scripts I've written over the years.☆123Updated 5 months ago
- Conversational text Analysis using various NLP techniques☆180Updated 2 years ago
- 📊 Semantic search for headlines and story text☆360Updated last year
- Semantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engine☆242Updated 2 years ago
- Example scripts for the pushshift dump files☆376Updated last week
- Python code for building a GPT-3 based technical blog post optimizer.☆85Updated 2 years ago
- A tool to automatically turn any Wikipedia article into a video☆56Updated 2 years ago
- A Python scraper for Goodreads books and reviews.☆295Updated 4 months ago
- 🧠 AI memory assistant – remember everything you read☆301Updated 2 years ago
- Contains scripts and data to render map of reddit☆116Updated 2 months ago
- A multithread Pushshift.io API Wrapper for reddit.com comment and submission searches.☆221Updated 2 years ago
- Fully generated fake resumes using machine learning models trained off ~6000 JSON resumes.☆218Updated 4 years ago
- The world's largest social media toxicity dataset.☆181Updated 3 years ago
- GPT-3 Explorer☆208Updated 4 years ago
- A python utility for downloading Common Crawl data☆242Updated 2 years ago
- Python script to download public Tweets from a given Twitter account into a format suitable for AI text generation.☆224Updated 5 years ago
- ☆81Updated 6 years ago
- Offline Internet Archive project☆289Updated last year
- Tag news stories based on models trained on the NYT corpus.☆42Updated 2 years ago
- Streaming WARC/ARC library for fast web archive IO☆422Updated 7 months ago
- Unreliable News Index (for Columbia Journalism Review)☆56Updated 3 years ago
- Espial is an engine for automated organization and discovery of personal knowledge☆175Updated 3 years ago
- The Python script for downloading new mp3 from RSS given channels☆128Updated 4 months ago
- Releases for the reddit-graph project☆18Updated last year
- Chat interface to gpt-j. Runs in Google Colab.☆58Updated last year
- GPT Takes the Bar Exam☆142Updated 2 years ago
- ☆44Updated 4 years ago
- A Flask webapp & Python scripts for predicting reddit users' political leaning, using their comment history.☆64Updated last year