nytlabs / pageinfo
Python module for extracting information from web pages
☆42Updated 10 years ago
Alternatives and similar repositories for pageinfo:
Users that are interested in pageinfo are comparing it to the libraries listed below
- Know more with less☆50Updated 10 years ago
- A reverse part-of-speech tagger. Give it a list of tags and it spews out matching language.☆23Updated 9 years ago
- A new version of the software used in the Cluetrain listicle☆19Updated 10 years ago
- ArchiveKit manages data and documents during ETL processes, either on a local file system or on S3.☆15Updated 9 years ago
- Neddick: Open Source Information Discovery Platform☆36Updated last year
- RiTaJS: A generative language toolkit for JavaScript☆43Updated 4 years ago
- A Python version (almost a port) of ProPublica's TableFu☆233Updated 11 years ago
- Code for Newslynx App☆22Updated 9 years ago
- Python library with common functionality for writing web scrapers☆102Updated 9 years ago
- A Python module to access Pinboard.in via its API. This is a fork/modification of mudge/python-delicious☆168Updated 10 years ago
- Helper methods for generating text that conforms to "The New York Times Manual of Style and Usage"☆27Updated 10 years ago
- Utilities for working with data.☆20Updated 9 years ago
- A command line utility for generating Google Analytics reports that are straightforward to compare across domains, projects or pages.☆41Updated 3 years ago
- webstore is a web-api enabled datastore backed onto sql databases especially sqlite. It supports the RESTful JSON APIs standard to nosql …☆40Updated 5 years ago
- A simple storage system based on Twitter identity implemented in Node.js.☆103Updated 2 years ago
- Open-source fork of code behind http://everyblock.com/☆96Updated 12 years ago
- A Django-based open source CMS for newspapers☆16Updated 13 years ago
- A simple transformation/data processing pipeline for CrisisNET☆15Updated 10 years ago
- PANDA: A Newsroom Data Appliance☆206Updated 2 years ago
- NPR Visual's Carebot (deprecated, now in: https://github.com/thecarebot/carebot)☆15Updated 9 years ago
- a set of services that provide NLP facilities☆25Updated 4 years ago
- ☆35Updated 13 years ago
- python-readability, but faster (mirror-ish)☆84Updated 13 years ago
- A new way to share ideas, do projects, and make our cities better☆55Updated 11 years ago
- A library for accessing a spreadsheet as a native Python object suitable for templating.☆225Updated 6 years ago
- HiiDef web spider framework, powers http://flavors.me☆18Updated 12 years ago
- You keep personal data in all sorts of places on the internets. Jellyroll brings them together onto your own site.☆132Updated 13 years ago
- Ultra simple API for geocoding a single string against various web services.☆183Updated 11 years ago
- Scripts and scripts you can use to quickly launch and build ec2 instances.☆30Updated 13 years ago
- A PostgreSQL pipeline for Reporter.☆18Updated last year