An attempt at creating a gold standard dataset for backtesting yesterday & today's content-extractors
☆35Mar 19, 2015Updated 11 years ago
Alternatives and similar repositories for crawl-to-the-future
Users that are interested in crawl-to-the-future are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An exercise in unsupervised machine learning: Extract Article's Text in HTml documents.☆431Jan 16, 2026Updated 4 months ago
- Analysis related to article on FOIA Online Database.☆11Feb 2, 2017Updated 9 years ago
- get facebook data☆10Sep 14, 2014Updated 11 years ago
- Fast structured perceptron sequential labeler☆15Dec 8, 2015Updated 10 years ago
- Migrating to https://github.com/origamitower/folktale☆20Sep 6, 2016Updated 9 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- An implementation of the Mixcoin mixing protocol☆13Nov 12, 2014Updated 11 years ago
- Autocomplete - light-weight, next-word prediction Python utility☆451Jan 16, 2026Updated 4 months ago
- DEPRECATED: Use ghc-heap, ghc-heap-view in GHC 8.x instead.☆18Sep 17, 2016Updated 9 years ago
- Replication files for the March 2, 2015 Barron's story "The Little Guy Wins!," measuring market makers' trade execution quality.☆13Mar 12, 2015Updated 11 years ago
- Adds read support for Excel files (xls and xlsx) to agate.☆18May 19, 2026Updated last week
- LEMS interpreter implemented in Python☆12May 18, 2026Updated last week
- Extract data from websites using basic statistical magic☆506Oct 2, 2020Updated 5 years ago
- The missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!☆41May 29, 2017Updated 9 years ago
- Python interpreter written in pure Erlang.☆60Jan 10, 2013Updated 13 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- [UNMAINTAINED] A hypermedia REST HTTP API library for Clojure☆76Jul 12, 2015Updated 10 years ago
- Blog showing content form steem blockchian using steemit platform and api.☆10May 26, 2017Updated 9 years ago
- Functional GPU programming - DSEL & compiler☆22Sep 9, 2016Updated 9 years ago
- Ensure that a stream disconnects if it goes over `maxBytes` `perSeconds`☆13Apr 27, 2020Updated 6 years ago
- Parse live video and extract Chyron text☆20Aug 17, 2017Updated 8 years ago
- A Lit web-component for viewing a Whisper JSON transcript file☆14Feb 12, 2026Updated 3 months ago
- This project deals with hierarchical classification of web pages based on dmoz dataset.☆14Apr 10, 2014Updated 12 years ago
- A distributed in-memory fabric based on shared-memory blocks and datashape. Any language can operate on the data.☆13Feb 12, 2016Updated 10 years ago
- RWA recurrent neural networks☆18Apr 14, 2017Updated 9 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- A how-to do a mass collection of FEC data using the command-line and regular expressions☆29Mar 18, 2016Updated 10 years ago
- Manage and load dataprotocols.org Data Packages☆27Sep 17, 2015Updated 10 years ago
- Links parts of input text to Wikipedia articles☆16Sep 9, 2012Updated 13 years ago
- Failover AWS Spot Instances☆11Dec 8, 2017Updated 8 years ago
- a relational algebra library for JavaScript☆60Apr 15, 2026Updated last month
- A skeleton Django project☆94Jan 21, 2022Updated 4 years ago
- ☆26Oct 3, 2020Updated 5 years ago
- System for mining Wikipedia Usage data to read our collective mind☆20Sep 28, 2014Updated 11 years ago
- Karma Framework for running performance tasks using Telemetry☆37May 29, 2019Updated 6 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Content-based Recommendation Generator☆13Jan 21, 2015Updated 11 years ago
- Linked SDMX☆17Oct 26, 2014Updated 11 years ago
- connect middleware that causes chaos☆26Feb 26, 2015Updated 11 years ago
- A dataset of popular pages (taken from <dir.yahoo.com>) with manually marked up semantic blocks.☆15Feb 9, 2014Updated 12 years ago
- Age classification from text using PAN16, blogs, Fisher Callhome, and Cancer Forum☆18Jul 1, 2022Updated 3 years ago
- mltk - Moz Language Tool Kit☆12Mar 6, 2015Updated 11 years ago
- Mange Python with Boxen and pyenv☆20Nov 9, 2017Updated 8 years ago