justin / podscraper
Python scripts to scrape the iTunes Podcast categories.
☆12Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for podscraper
- A JSON list of podcast hosts and a pattern to use in audio URLs☆43Updated 6 months ago
- User-agent parser for common podcast clients☆20Updated 3 weeks ago
- A pipeline for crawling of RSS feeds and the associated content. Demo at newsfeed.ijs.si.☆21Updated 12 years ago
- A very simple analytics abstraction layer. Write your events once, then send them where ever you want.☆61Updated 3 months ago
- Launch AWS Elastic MapReduce jobs that process Common Crawl data.☆49Updated 7 years ago
- Easy metric tracking and aggregation using Redis☆28Updated 3 years ago
- Extract postal addresses from the DOM☆66Updated 12 years ago
- Index URLs in Common Crawl☆193Updated 7 years ago
- Readability/Boilerpipe extraction in Python☆55Updated 8 years ago
- Useragents used by apps and services to query RSS data☆55Updated 7 months ago
- natural language parsing of recipe ingredients☆54Updated 4 years ago
- US Street Address Parser☆158Updated last year
- Open Source implementation of Summly☆47Updated 7 years ago
- Parsing resumes in a PDF format from linkedIn☆66Updated 8 years ago
- ☆18Updated 9 years ago
- Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)☆205Updated 6 months ago
- I always thought that AWS S3 should just resize your images when you put them into a bucket. It doesn't. However with the new Lambda serv…☆15Updated 9 years ago
- A web service that computes a set of readability metrics for text. We currently support the following metrics: Automated Readability Inde…☆71Updated 2 years ago
- An open, platform-agnostic list of user-agent and referrer regexes for use in podcast analytics services☆122Updated last year
- Partial EC5.1 polyfill for ISO 8601 support in the JavaScript Date object.☆33Updated 12 years ago
- Easy extraction of keywords and engines from search engine results pages (SERPs).☆90Updated 3 years ago
- Fork of the boilerpipe project☆48Updated 11 years ago
- Automatically extracts and normalizes an online article or blog post publication date☆118Updated last year
- Port of mailgun/talon (signature detection in mails) from Python to JavaScript☆19Updated 4 months ago
- JavaScript code to split names into their respective components (first, last, etc)☆111Updated 7 years ago
- Easy way to retrieve Google Page Rank, Alexa Rank, index counts, and backlink counts☆274Updated 7 years ago
- An attempt at creating a silver/gold standard dataset for backtesting yesterday & today's content-extractors☆34Updated 9 years ago
- Training/test data for Dragnet☆41Updated 9 years ago
- A Lambda function for charging cards with Stripe☆160Updated 6 years ago