XTractor is an algorithmic text extractor from web pages written in Java. It builds upon the "commonly used web design practices" approach (from readability.js, goose and snacktory) to create a set of heuristics for fast article text extraction. It adds several features like paragraph preservation, better image detection heuristics, sibling sco…
☆45Feb 5, 2016Updated 10 years ago
Alternatives and similar repositories for xtractor
Users that are interested in xtractor are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- In ancient Egypt the pelican was believed to possess the ability to prophesy safe passage in the underworld. Pelicans are ferocious eater…☆11Apr 7, 2023Updated 3 years ago
- Implementation of a Whois Server with a redis backend☆15Oct 31, 2010Updated 15 years ago
- Let's party like ethernet in 1999.☆19Sep 6, 2014Updated 11 years ago
- A collection view subview for handling multiple continues touches on cells.☆17Nov 8, 2019Updated 6 years ago
- Smart align block around cursor☆11Jun 23, 2019Updated 6 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- MySQL UDFs to work with the Google v8 javascript engine☆27Sep 26, 2014Updated 11 years ago
- 华南理工大学高英实验室进行的分布式爬虫项目,除了实验室内部人员外,不得私自传播.☆21Jul 13, 2014Updated 11 years ago
- inverted-index for level with pagination, sift3/cosine distance, tf-idf ranking, and more☆27Feb 5, 2014Updated 12 years ago
- Autoproxy automatically detects proxies and stores them in the respective environment variables (e.g. http_proxy).☆13Oct 2, 2016Updated 9 years ago
- ☆12Jan 27, 2016Updated 10 years ago
- a framework for turning written sentences into structured data with simple parsers.☆18Dec 13, 2017Updated 8 years ago
- A basic frontend to gobwmon using chart.js☆11Feb 27, 2016Updated 10 years ago
- Code samples for the Speedment ORM☆13Jun 21, 2022Updated 3 years ago
- A free multithreaded proxy checking program written in Java. Load a proxy list and check each proxy to verify it's alive to create a new …☆11Nov 5, 2015Updated 10 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- search topics of sina weibo by phantomjs☆12Dec 20, 2015Updated 10 years ago
- The Datagram Stream Transfer protocol☆23Jul 29, 2015Updated 10 years ago
- An easy and flexible mathematical programming environment for Python.☆12Jun 16, 2018Updated 7 years ago
- Emulador de MVS (Neo-Geo)☆16Jan 9, 2014Updated 12 years ago
- ☆15Aug 5, 2022Updated 3 years ago
- An Emacs extension you can sort CSS attributables automatically.☆14Nov 22, 2018Updated 7 years ago
- Copy, paste and move files like you do in Finder in Dired.☆14Nov 6, 2020Updated 5 years ago
- Analysis plugin for ElasticSearch providing capability for processing inline annotations in documents.☆35Jan 24, 2014Updated 12 years ago
- Kairos, combines a focused crawler and an information extraction engine, to convert a list of conference websites into a index filled wit…☆19Feb 20, 2011Updated 15 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- A script to download railscasts videos to watch them later☆38Aug 26, 2015Updated 10 years ago
- A simple indexing program to quickly search through source code.☆23May 19, 2014Updated 12 years ago
- 📊 Analysis tool for funnel visualization with log from Elasticsearch☆13Jun 21, 2022Updated 3 years ago
- A Nutch 2.2.1 plugin which allows users to shuffle off the responsibility for retrieving pages to a selenium hub/node spoke system. This …☆16Jun 9, 2016Updated 9 years ago
- Windows Live API binding and connect support.☆18Dec 1, 2024Updated last year
- 基于搜索引擎实现网盘搜索☆12Nov 15, 2018Updated 7 years ago
- A webfinger handler build with CloudFlare Workers and KV Store☆27Sep 18, 2023Updated 2 years ago
- a customized version of origin hdfs-webdav from iponweb.net to support Hadoop 0.20.1☆26Aug 20, 2011Updated 14 years ago
- HtmlExtractor是一个Java实现的基于模板的网页结构化信息精准抽取组件。☆157Aug 27, 2018Updated 7 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- A lightweight Node.js theme engine.☆13Dec 11, 2023Updated 2 years ago
- Incremental implementation of a scheme compiler☆29Mar 16, 2013Updated 13 years ago
- Blog crawler for the blogforever project.☆23Jan 31, 2014Updated 12 years ago
- java分布式爬虫,主机和从机控制的机制☆14May 21, 2015Updated 11 years ago
- BeautyTips is a jQuery tooltips plugin which uses the canvas drawing element in the HTML5 spec to dynamically draw tooltips (sometimes ca…☆36Sep 25, 2012Updated 13 years ago
- Chrome extension that allows you to filter HN stories using a comma separated list☆24Jan 24, 2017Updated 9 years ago
- a readability client for android☆25Jan 23, 2012Updated 14 years ago