Scripts for building a geo-located web corpus using Common Crawl data
☆11Jan 18, 2026Updated 2 months ago
Alternatives and similar repositories for common_crawl_corpus
Users that are interested in common_crawl_corpus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Crawling engine that crawls a set of top-level domains looking for documents in a list of languages☆11Feb 6, 2024Updated 2 years ago
- a light-weight database application designed to standardize and simplify data entry of archaeological or historical artifacts.☆13Mar 1, 2026Updated 3 weeks ago
- This implements a technique for curve fitting by fractal interpolation found in a paper by Manousopoulos, Drakopoulos, and Theoharis, fou…☆17Feb 27, 2014Updated 12 years ago
- A Python library to calculate avoided costs☆17Aug 19, 2025Updated 7 months ago
- Here are all of the PowerPoint presentations that I have ever created and presented.☆12Dec 28, 2020Updated 5 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Setup of TensorFlow and PyTorch on Ubuntu 18.04 -- the easy way!☆31May 14, 2020Updated 5 years ago
- A Python package for learning, evaluating, annotating, and extracting vector representations of construction grammars☆43Oct 17, 2024Updated last year
- It is a basic Rule Base ChatBot purely made in Python(Flask) and run on server hosted on PythonAnywhere and works with the help of Twilio…☆18Aug 13, 2024Updated last year
- Use Python to Automate the PowerPoint Update☆15May 28, 2023Updated 2 years ago
- My presentation at RStudio::conf(2019), Austin, Tx - "The lazy and easily distracted report writer"☆17Feb 5, 2019Updated 7 years ago
- subdomain list based on Common Crawl data, sorted by popularity☆17Nov 19, 2019Updated 6 years ago
- Convert powerpoint (pptx) files into raw text org or LaTeX files☆15Aug 28, 2018Updated 7 years ago
- TensorFlow training at RStudio::conf(2019)☆25Jan 22, 2019Updated 7 years ago
- Multitaper R package available on CRAN☆10Jul 17, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Extract images from PowerPoint files☆17Dec 1, 2011Updated 14 years ago
- Automated generation of powerpoint slides for fun and profit☆13Oct 18, 2017Updated 8 years ago
- Python script to streamline the process of posting external HITs to Amazon's Mechanical Turk crowdsourcing website.☆11Oct 20, 2020Updated 5 years ago
- SNA project☆26Nov 11, 2012Updated 13 years ago
- Digital Research Toolkit for Linguists course materials☆12Jul 23, 2025Updated 8 months ago
- Tools for compiling corpora from Common Crawl☆14Nov 24, 2024Updated last year
- ☆15Dec 23, 2024Updated last year
- Tutorials for Tidy Modeling with R☆15Jan 29, 2024Updated 2 years ago
- Automation of the creation of a progress bar in powerpoint, and an overview of the sections on each slide☆14Nov 14, 2017Updated 8 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Save yourself from 'Death by PowerPoint'☆15Feb 18, 2020Updated 6 years ago
- A mansplaining tool for bourne-like shells☆11Feb 2, 2020Updated 6 years ago
- Crawler based on a modified browser to detect online tracking.☆11Jul 19, 2023Updated 2 years ago
- Go through the list of accepted papers for ICLR in terminal and add them to your reading list.☆13Jan 30, 2021Updated 5 years ago
- Residual Quantization Autoencoder, used for interpreting LLMs☆14Jan 1, 2025Updated last year
- R package: a tile-based roguelike toy for R's console, featuring procedural dungeons and enemy pathfinding☆18Jan 3, 2023Updated 3 years ago
- fastACI toolbox: the MATLAB toolbox for investigating auditory perception using reverse correlation.☆15Dec 29, 2025Updated 2 months ago
- Utilities to gather software metrics from tools (SONAR, etc) and store them into ElasticSearch for later display using Kibana.☆11Dec 31, 2017Updated 8 years ago
- Exports plaintext speaker notes from Microsoft Powerpoint .pptx files☆20Feb 28, 2018Updated 8 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Miscellaneous Ggplot2 Extensions☆23Oct 3, 2018Updated 7 years ago
- KnowMAN: Weakly Supervised Multinomial Adversarial Networks☆12Nov 9, 2021Updated 4 years ago
- Python script to remove notes from PPTX Powerpoint files☆17Nov 18, 2022Updated 3 years ago
- Mason-Alberta Phonetic Segmenter☆15Feb 24, 2026Updated last month
- Code repository accompanying the CHI 2021 Paper titled "Adapting User Interfaces with Model-based Reinforcement Learning"☆16Oct 18, 2021Updated 4 years ago
- 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated 11 months ago
- A smart distributed crawler that infers navigation models of structured websites, used to cluster pages based on their structure and extr…☆10Aug 17, 2025Updated 7 months ago