ContentMine / quickscrape
A scraping command line tool for the modern web
☆260Updated 8 years ago
Alternatives and similar repositories for quickscrape:
Users that are interested in quickscrape are comparing it to the libraries listed below
- Journal scraper definitions for the ContentMine framework☆66Updated 6 years ago
- Get metadata, fulltexts or fulltext URLs of papers matching a search query☆197Updated 4 years ago
- Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head☆170Updated 4 years ago
- "Old SFM" -- manage rules and streams from social data sources, starting with twitter.☆86Updated last year
- An online annotation platform for teaching and learning in the humanities.☆107Updated last month
- A queue-controlled browser automation tool for improving web crawl quality☆60Updated 2 weeks ago
- A full-stack publishing solution involving different technologies to power digital archives☆157Updated 4 years ago
- Browsertrix: Containerized High-Fidelity Browser-Based Automated Crawling + Behavior System☆87Updated 4 years ago
- Social Feed Manager user interface application.☆155Updated 9 months ago
- Solrstrap is a Query-Result interface for Solr written in JavaScript, HTML and CSS☆86Updated 7 years ago
- A framework for creating web-based knowledge maps☆199Updated this week
- Python scripts for interacting with the hypothes.is API☆48Updated 7 years ago
- Superfeedr powered pipes!☆131Updated 9 years ago
- Computer assisted video/audio transcription☆97Updated 4 years ago
- Data conversions and examples for generating reports from twarc collections using tools such as D3.js☆55Updated 4 years ago
- A novel way of viewing eLife articles.☆375Updated 2 years ago
- Enhanced Social Tagging for Academic Communities☆95Updated 5 months ago
- utility to fetch provenance information from Internet Archive's Wayback Machine☆13Updated 2 years ago
- 📚 A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivity☆95Updated 6 years ago
- Actor Network Text Analyser☆56Updated 10 years ago
- Tools for tracking stories on news homepages☆48Updated 5 years ago
- Documentation and project-wide issues for the Website Monitoring project (a.k.a. "Scanner")☆108Updated last month
- Creates github index for similar repositories discovery☆191Updated 8 years ago
- One-Click User Instigated Preservation☆126Updated 6 years ago
- A simple catalog of Twitter ID Datasets☆28Updated 4 months ago
- Repository for the DMPTool Project☆36Updated 6 years ago
- Launch AWS Elastic MapReduce jobs that process Common Crawl data.☆49Updated 8 years ago
- ☆29Updated 8 years ago
- A harvester for twitter content as part of Social Feed Manager.☆17Updated last year
- Data Store for Annotation Studio☆46Updated 2 years ago