XTractor is an algorithmic text extractor from web pages written in Java. It builds upon the "commonly used web design practices" approach (from readability.js, goose and snacktory) to create a set of heuristics for fast article text extraction. It adds several features like paragraph preservation, better image detection heuristics, sibling sco…
☆44Feb 5, 2016Updated 10 years ago
Alternatives and similar repositories for xtractor
Users that are interested in xtractor are comparing it to the libraries listed below
Sorting:
- Implementing java based text extractors as web APIs (currently only Boilerpipe & Goose)☆16Apr 1, 2012Updated 13 years ago
- Using Swagger for REST API documentation☆29Sep 2, 2015Updated 10 years ago
- ☆18Jul 14, 2018Updated 7 years ago
- A collection view subview for handling multiple continues touches on cells.☆17Nov 8, 2019Updated 6 years ago
- a framework for turning written sentences into structured data with simple parsers.☆18Dec 13, 2017Updated 8 years ago
- A lightweight julia wrapper for WORLD - a high-quality speech analysis, modification and synthesis system☆30Sep 19, 2020Updated 5 years ago
- Official repository for Characterization of tumor heterogeneity through segmentation-free representation learning on multiplexed imaging …☆14Sep 28, 2025Updated 4 months ago
- Navigating around a grid of cells like XPath for spreadsheets; supports Python 3.5+☆48Feb 1, 2023Updated 3 years ago
- Pumilio: A Web-Based Management System for Ecological Recordings☆13Oct 29, 2018Updated 7 years ago
- RealTime Motion Capture Toolbox for Matlab☆10Apr 11, 2016Updated 9 years ago
- SPA: Efficient User-Preference Alignment against Uncertainty in Medical Image Segmentation (ICCV 2025)☆14Sep 26, 2025Updated 5 months ago
- Vintage Typography with Web Fonts☆14Dec 22, 2015Updated 10 years ago
- Code, source data, examples, and audio excerpts for Flow: Expressive Rhythm in the Rapping Voice☆10Feb 13, 2020Updated 6 years ago
- SChunk-Encoder (Transformer or Conformer) for streaming E2E ASR☆11Oct 21, 2022Updated 3 years ago
- Implementation of "Look, Listen and Recognise:character-aware audio-visual subtitling"☆19Nov 3, 2025Updated 3 months ago
- 豆瓣电影评论可视化☆10May 19, 2016Updated 9 years ago
- 1st place solution to the DCASE 2020 - Task 5 - Urban Sound Tagging with Spatiotemporal Context☆16Dec 8, 2022Updated 3 years ago
- A script for quickly configuring a stock set of common Android device emulators☆17Nov 5, 2013Updated 12 years ago
- Listen to the weather using Sonic Pi and data from Mathematica☆11Dec 6, 2018Updated 7 years ago
- Resources for "Simple Speech Representation Learning from Perceptual Data".☆11Sep 18, 2023Updated 2 years ago
- Research_speech_speaker_verification_nist_sre2010☆12Mar 1, 2016Updated 9 years ago
- Designed to help lawyers and legal professionals find precedent fast and prepare for case negotiations by simulating trajectories☆10Oct 16, 2024Updated last year
- ☆10Jul 24, 2019Updated 6 years ago
- Tool for Evaluating Multilingual WS-353 and SimLex-999☆10Dec 15, 2016Updated 9 years ago
- A Tree-LSTM-based dependency tree sentiment labeler☆15May 9, 2019Updated 6 years ago
- ☆13Updated this week
- ATC-Anno is an annotation tool for Air Traffic Control data that offers automatic semantic and concept annotation.☆12Nov 17, 2023Updated 2 years ago
- NFS Server DroboApp build scripts☆10Jan 26, 2016Updated 10 years ago
- An application to display the text of the Hebrew Bible (Leningrad codex) along with an English translation (1917 JPS) and an audio record…☆13Jul 17, 2015Updated 10 years ago
- Issue with NSCollectionView's default drag and drop implementation☆12May 3, 2018Updated 7 years ago
- PyGun: Procedural Generation of Anechoic Gunshot Sounds☆14Oct 8, 2016Updated 9 years ago
- Github mirror of MediaWiki extension Wikispeech - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Develo…☆12Updated this week
- A Word Aligner for English☆11Feb 15, 2017Updated 9 years ago
- API status is a simple tool that checks if an API is online. http://apistatus.org☆15Sep 15, 2021Updated 4 years ago
- A simple text based AI to execute commands using NLP☆12Apr 14, 2017Updated 8 years ago
- Course materials for a 3-day seminar "Machine Learning and NLP: Advances and Applications" at New College of Florida☆12Feb 10, 2022Updated 4 years ago
- ☆11Sep 26, 2017Updated 8 years ago
- Term List Matching Plugin for ElasticSearch☆26Jan 20, 2014Updated 12 years ago
- Capture and replay execution traces of client-side web applications☆28May 31, 2013Updated 12 years ago