gdamdam / sumoLinks
Tool to extracts the text from a web article urls and get frequency words, entities recognition, automatic summary and more
☆20Updated 6 years ago
Alternatives and similar repositories for sumo
Users that are interested in sumo are comparing it to the libraries listed below
Sorting:
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆46Updated 7 years ago
- Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your com…☆131Updated 5 months ago
- A free dataset of (almost) all publicly available podcasts.☆134Updated 11 years ago
- Labeled segmentation for the document structure of printed books☆15Updated 8 years ago
- GPT2Explorer is bringing GPT2 OpenAI langage models playground to run locally on standard windows computers.☆28Updated 2 years ago
- Crawl sites for RSS, Atom, and JSON feeds.☆78Updated last week
- ☆18Updated 7 months ago
- web based editor for subtitles and transcripts☆140Updated last year
- Python package for converting xml and epubs to text files☆33Updated 5 years ago
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆18Updated 2 years ago
- Faster, modernized fork of the language identification tool langid.py☆56Updated 9 months ago
- A company/project name generator for Python. Uses NLTK and diverse techniques derived from existing corporate etymologies and naming agen…☆50Updated 8 years ago
- A library that helps you to convert from one subtitle format to another☆19Updated 6 years ago
- automate incrementally producing word pronunciation recordings for Wiktionary through Wikimedia Commons☆22Updated 7 years ago
- Crawl Wikipedia pages and upload TTS to Youtube.☆10Updated 4 months ago
- WordNet Domains, WordNet Affect and SentiWords☆47Updated 9 years ago
- Hyperaudio Lite - a Super-lightweight Interactive Transcript Player☆151Updated 9 months ago
- webapp for unglue.it - A Free Ebook Foundation program☆18Updated last month
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 7 years ago
- Character-level conversion between Hebrew text and Latin transliteration using deep learning - a demonstration of seq2seq training.☆14Updated 2 years ago
- WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.☆47Updated 7 years ago
- Finds music files inside source folders recursively, makes a m3u file without metadata☆12Updated 2 years ago
- Simple and clean Python implementation of TextRank as per seminal paper by Rada Mihalcea and Paul Tarau. This implementation performs bot…☆11Updated 4 years ago
- Quantified Self: A Personal Data Aggregator and Dashboard for Self-Trackers and Quantified Self Enthusiasts☆17Updated 2 years ago
- The Python script for downloading new mp3 from RSS given channels☆134Updated 5 months ago
- search, dedupe, and media ingestion for mediachain☆33Updated 8 years ago
- Check out https://github.com/webrecorder/webrecorder for newer version matching https://webrecorder.io☆38Updated 9 years ago
- Wikidata's QRank as a SQLite DB.☆28Updated last year
- Functional composable pipelines allowing clean separation of the business logic and its implementation☆11Updated last year
- Presentations on Quantified Self and Self-Tracking with Python☆30Updated 2 years ago