daveshap / PlainTextWikipedia
Convert Wikipedia database dumps into plaintext files
β302Updated 3 years ago
Related projects: β
- Code for the paper: "Large Language Models as Corporate Lobbyists" (2023).β168Updated last year
- π Retrieval augmented generation (RAG) and language model powered search applicationsβ266Updated 8 months ago
- Download subreddit commentsβ90Updated 2 years ago
- Nearly a thousand bash and python scripts I've written over the years.β118Updated 2 months ago
- Semantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engineβ241Updated last year
- GPT Takes the Bar Examβ140Updated last year
- Example scripts for the pushshift dump filesβ275Updated this week
- Generate question/answer training pairs out of raw text.β196Updated 9 months ago
- A python utility for downloading Common Crawl dataβ220Updated last year
- A command-line interface to generate textual and conversational datasets with LLMs.β291Updated last year
- Multi-angle c(q)uestion answeringβ458Updated 2 years ago
- A lightweight command-line interface to OpenAI's GPT-3. Temperature, presence, and frequency up to 2. Streaming supportβ55Updated last year
- Statistics of Common Crawl monthly archives mined from URL index filesβ140Updated 2 weeks ago
- Code repo for "Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio" at the (CAI2) worβ¦β206Updated last year
- Python code for building a GPT-3 based technical blog post optimizer.β83Updated 2 years ago
- π βοΈ ETL processes for medical and scientific papersβ342Updated 9 months ago
- Conversational text Analysis using various NLP techniquesβ177Updated last year
- Streaming WARC/ARC library for fast web archive IOβ369Updated 3 weeks ago
- experiment to generate novel-length fiction from a single story premiseβ29Updated 2 years ago
- β245Updated last year
- Offline Internet Archive projectβ263Updated 7 months ago
- π Semantic search for headlines and story textβ355Updated 11 months ago
- This shows the results from using a second, filter LLM that analyses prompts before sending them to GPT-Chatβ105Updated last year
- Implement recursion using English as the programming language and an LLM as the runtime.β125Updated last year
- Downloader for submissions to reddit.com. Supports both subreddits and users.β45Updated 4 years ago
- Concise answers to search queries using Google and GPT-3. Includes citations.β74Updated last year
- A Python library for calculating a large variety of metrics from textβ309Updated this week
- Cleaning tool for web scraped textβ39Updated last year
- Large language model evaluation and workflow framework from Phase AI.β448Updated 2 months ago
- Reddit script to archive user's saved Reddit posts and commentsβ32Updated 4 years ago