Small set of utilities to simplify writing Scrapy spiders.
☆49Jul 24, 2015Updated 10 years ago
Alternatives and similar repositories for scrapy-boilerplate
Users that are interested in scrapy-boilerplate are comparing it to the libraries listed below
Sorting:
- A helper to create web scrapers using scrapy selector in a Model based structure☆57Dec 26, 2022Updated 3 years ago
- Utility to re-structure research papers published in US Letter or A4 format PDF files to typically remove the 2 columns layout.☆53Nov 8, 2010Updated 15 years ago
- Paginating the web☆37Feb 11, 2014Updated 12 years ago
- Clarify your words with emojis☆12Aug 25, 2016Updated 9 years ago
- Extensions for using Scrapy on Amazon AWS☆32Dec 5, 2012Updated 13 years ago
- A decorator to write coroutine-like spider callbacks.☆109Dec 26, 2022Updated 3 years ago
- Hawk HTTP Authorization for Django Rest Framework☆19Jul 28, 2020Updated 5 years ago
- Scrapy downloader middleware that stores response HTMLs to disk.☆18Jan 14, 2026Updated last month
- Crochet-based blocking API for Scrapy.☆46Feb 24, 2017Updated 9 years ago
- ☆143Nov 24, 2015Updated 10 years ago
- An R package for assembling data frames from HTML tables (fka htmltable)☆26Oct 27, 2018Updated 7 years ago
- Restrict crawl and scraping scope using matchers.☆26Jun 8, 2016Updated 9 years ago
- Scrapy middleware to add extra fields to items, like timestamp, response fields, spider attributes etc.☆57Mar 16, 2022Updated 3 years ago
- Tool to flatten stream of JSON-like objects, configured via schema☆33Oct 19, 2019Updated 6 years ago
- ☆68Sep 7, 2018Updated 7 years ago
- Python for students in humanities, NRU HSE, 2018-2019☆18Mar 7, 2023Updated 3 years ago
- Find which links on a web page are pagination links☆29Jan 12, 2017Updated 9 years ago
- Argument Parsing for Humans™☆207Jul 7, 2017Updated 8 years ago
- Collaborative collection of Tornado related Github gists☆39Jul 2, 2012Updated 13 years ago
- This is a collection of mostly R code to use text mining to analyse conference abstracts, blogs and other sources in an attempt to look f…☆42Sep 9, 2015Updated 10 years ago
- A scrapy pipeline which send items to Elastic Search server☆98Jan 2, 2018Updated 8 years ago
- KOPS instllation in aws☆11Aug 6, 2018Updated 7 years ago
- A Data Mesh demo repository☆13Oct 10, 2024Updated last year
- How to add formulas to Google Spreadsheet using Google Apps Script - Sarmad Gardezi☆17Apr 24, 2025Updated 10 months ago
- A Sublime Text plugin to move through and reform things☆179Sep 28, 2023Updated 2 years ago
- MongoDB extensions for Scrapy☆44Oct 2, 2014Updated 11 years ago
- Wordpress plugin for Magic the Gathering that enables card tooltips and formatted deck listings.☆13Dec 24, 2025Updated 2 months ago
- A collection of github workflow patterns☆10Feb 1, 2024Updated 2 years ago
- NER toolkit for HTML data☆259May 3, 2024Updated last year
- This is a project crawling backpack information and images from Amazon using python scrapy and store data to sqlite database.☆34Sep 25, 2015Updated 10 years ago
- Output scrapy statistics to graphite/carbon☆54Mar 9, 2013Updated 12 years ago
- Create a Google Sheet from a CSV file preventing auto-formatting of date and number fields☆10Jun 27, 2017Updated 8 years ago
- A generic crawler☆79Feb 10, 2026Updated 3 weeks ago
- FridgeToPlate is a user-friendly app that utilizes data management and document database models to gather recipes based on the ingredient…☆10Nov 26, 2023Updated 2 years ago
- Static photoessay generator using gulp.js☆10Mar 20, 2019Updated 6 years ago
- Semantic memory system for Claude Code - provides persistent conversation memory through vector search of session summaries☆30Jul 21, 2025Updated 7 months ago
- A generic interface wrapping multiple backends to provide a consistent pubsub API☆13Oct 31, 2018Updated 7 years ago
- Materials and reproducible workflows for working with health care data☆12Apr 11, 2018Updated 7 years ago
- ☆12Apr 24, 2017Updated 8 years ago