XTractor is an algorithmic text extractor from web pages written in Java. It builds upon the "commonly used web design practices" approach (from readability.js, goose and snacktory) to create a set of heuristics for fast article text extraction. It adds several features like paragraph preservation, better image detection heuristics, sibling sco…
☆44Feb 5, 2016Updated 10 years ago
Alternatives and similar repositories for xtractor
Users that are interested in xtractor are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Algorithmic summarizer for RSS/Atom Feeds, Web Urls and arbitrary text. Codebase for the application deployed at http://tldrzr.herokuapp.…☆53Sep 4, 2016Updated 9 years ago
- Implementing java based text extractors as web APIs (currently only Boilerpipe & Goose)☆16Apr 1, 2012Updated 13 years ago
- proof of concept "deploy on git push" workflow using docker☆25Apr 15, 2015Updated 10 years ago
- A collection view subview for handling multiple continues touches on cells.☆17Nov 8, 2019Updated 6 years ago
- Smart align block around cursor☆11Jun 23, 2019Updated 6 years ago
- Autoproxy automatically detects proxies and stores them in the respective environment variables (e.g. http_proxy).☆13Oct 2, 2016Updated 9 years ago
- ☆10Feb 26, 2019Updated 7 years ago
- A free multithreaded proxy checking program written in Java. Load a proxy list and check each proxy to verify it's alive to create a new …☆11Nov 5, 2015Updated 10 years ago
- search topics of sina weibo by phantomjs☆12Dec 20, 2015Updated 10 years ago
- urllib2 wrapper to make life easier☆32Jun 14, 2012Updated 13 years ago
- Web page content extractor☆31Feb 26, 2013Updated 13 years ago
- A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)☆65Aug 5, 2016Updated 9 years ago
- ☆13Dec 7, 2019Updated 6 years ago
- The Datagram Stream Transfer protocol☆23Jul 29, 2015Updated 10 years ago
- An easy and flexible mathematical programming environment for Python.☆12Jun 16, 2018Updated 7 years ago
- Emulador de MVS (Neo-Geo)☆15Jan 9, 2014Updated 12 years ago
- ☆15Aug 5, 2022Updated 3 years ago
- An Emacs extension you can sort CSS attributables automatically.☆14Nov 22, 2018Updated 7 years ago
- Copy, paste and move files like you do in Finder in Dired.☆14Nov 6, 2020Updated 5 years ago
- Kairos, combines a focused crawler and an information extraction engine, to convert a list of conference websites into a index filled wit…☆18Feb 20, 2011Updated 15 years ago
- Automatic CAPTCHA decoding☆11Apr 17, 2012Updated 13 years ago
- `fd` and `rg` together in Emacs with consult☆24Jan 17, 2026Updated 2 months ago
- Spring Boot Web with Hessian☆11Jul 2, 2014Updated 11 years ago
- ☆18Jul 14, 2018Updated 7 years ago
- A Nutch 2.2.1 plugin which allows users to shuffle off the responsibility for retrieving pages to a selenium hub/node spoke system. This …☆16Jun 9, 2016Updated 9 years ago
- A webfinger handler build with CloudFlare Workers and KV Store☆27Sep 18, 2023Updated 2 years ago
- A free, open source, AI powered alternative to Quizlet.☆16May 15, 2023Updated 2 years ago
- A λ-calculus shell (because I love writing shells).☆11Jan 2, 2020Updated 6 years ago
- Blog crawler for the blogforever project.☆23Jan 31, 2014Updated 12 years ago
- Super slick themes for Highcharts☆28Apr 27, 2014Updated 11 years ago
- a readability client for android☆25Jan 23, 2012Updated 14 years ago
- ☆14Apr 2, 2023Updated 2 years ago
- A lightweight julia wrapper for WORLD - a high-quality speech analysis, modification and synthesis system☆30Sep 19, 2020Updated 5 years ago
- Issue with NSCollectionView's default drag and drop implementation☆12May 3, 2018Updated 7 years ago
- generate short and reversible IDs as a replacement for numerical or hex IDs☆13Jul 6, 2020Updated 5 years ago
- You can do drawing on image and can add text overlay on image☆13Oct 4, 2017Updated 8 years ago
- Grpc evaluation functions for Emacs org-babel☆15Mar 19, 2022Updated 4 years ago
- Distributed Web crawler☆20May 25, 2021Updated 4 years ago
- A curated list of resources for and about Mozilla Firefox☆17Feb 19, 2017Updated 9 years ago