ContentMine / quickscrape
A scraping command line tool for the modern web
☆260Updated 8 years ago
Alternatives and similar repositories for quickscrape
Users that are interested in quickscrape are comparing it to the libraries listed below
Sorting:
- Journal scraper definitions for the ContentMine framework☆66Updated 6 years ago
- Get metadata, fulltexts or fulltext URLs of papers matching a search query☆198Updated 4 years ago
- Headless scraperJSON scraping for Node.js☆27Updated 8 years ago
- Facilitating the global conversation on academic literature☆266Updated 7 years ago
- A full-stack publishing solution involving different technologies to power digital archives☆158Updated 4 years ago
- "Old SFM" -- manage rules and streams from social data sources, starting with twitter.☆86Updated last year
- The scraperJSON standard for defining web scrapers as JSON objects☆33Updated 10 years ago
- An online annotation platform for teaching and learning in the humanities.☆108Updated 3 months ago
- One-Click User Instigated Preservation☆126Updated 6 years ago
- A JavaScript library to visualize and navigate graphs☆200Updated 7 years ago
- A novel way of viewing eLife articles.☆377Updated 3 years ago
- varied english texts for modern NLP testing☆75Updated 2 years ago
- See https://github.com/tworavens/tworavens for current repository for this project and http://2ra.vn for project pages.☆30Updated 6 years ago
- View, visualize, clean and process data in the browser.☆148Updated 6 years ago
- Data Store for Annotation Studio☆46Updated 2 years ago
- command-line tool to extract taxonomies from Wikidata☆125Updated 5 years ago
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆268Updated 2 years ago
- A queue-controlled browser automation tool for improving web crawl quality☆61Updated 2 months ago
- Schema.org in RDF☆188Updated 2 years ago
- Python scripts for interacting with the hypothes.is API☆48Updated 7 years ago
- A toolkit for mapping networks of political and economic influence through diverse types of entities and their relations. Accessible at h…☆188Updated 4 years ago
- track changes to the news, where news is anything with an RSS feed☆178Updated 4 years ago
- An extension to Google Refine that enables graphical mapping of Google Refine project data to an RDF skeleton and then exporting it in RD…☆94Updated last year
- BibServer is open-source software what makes it easy to publish, manage and find bibliographies. BibServer is RESTful and web-friendly.☆126Updated 6 years ago
- Documentation and project-wide issues for the Website Monitoring project (a.k.a. "Scanner")☆108Updated 2 months ago
- Distant Reader, a tool for using & understanding a corpus☆20Updated 2 years ago
- Convert XML/SVG/PDF into normalised, sectioned, scholarly HTML☆37Updated last year
- Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head☆170Updated 4 years ago
- MOVED TO https://gitlab.com/crossref/pdfextract☆509Updated 7 years ago
- Reproducible Document Archive☆81Updated 6 years ago