mohaps / tldrzr
Algorithmic summarizer for RSS/Atom Feeds, Web Urls and arbitrary text. Codebase for the application deployed at http://tldrzr.herokuapp.com
☆53Updated 8 years ago
Related projects: ⓘ
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Updated 10 years ago
- ☆13Updated 8 years ago
- XTractor is an algorithmic text extractor from web pages written in Java. It builds upon the "commonly used web design practices" approac…☆43Updated 8 years ago
- A tool for calculation semantic similarity between words from a text corpus based on lexico-syntactic patterns.☆28Updated 8 years ago
- Launch AWS Elastic MapReduce jobs that process Common Crawl data.☆49Updated 7 years ago
- An api to parse a CV, in particular the elements of its publication list☆35Updated 6 years ago
- ☆77Updated this week
- Node.js application to extract the knowledge represented in Google infoboxes (aka Google Knowlege Graph Panel)☆25Updated 7 years ago
- A crawler, indexer, and query interface all in Python with distributed processing via Pyro4.☆23Updated 12 years ago
- The first Open Source document analysis platform☆65Updated 3 years ago
- ☆15Updated 12 years ago
- This is a REST Server endpoint built using Flask and Python.☆23Updated last year
- Contains the implementation of algorithms that estimate the geographic location of media content based on their content and metadata. It …☆15Updated 7 years ago
- Node wrapper for Ark-TweetNLP.☆16Updated 8 years ago
- Human-Powered Data Analysis with Mechanical Turk☆300Updated 11 years ago
- Linking Entities in CommonCrawl Dataset onto Wikipedia Concepts☆59Updated 12 years ago
- ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…☆94Updated 6 years ago
- API that extracts metadata from a URL.☆26Updated 9 years ago
- Keyword Extraction system using Brown Clustering - (This version is trained to extract keywords from job listings)☆18Updated 10 years ago
- A POC at replicating Facebook Graph Search with Cypher and Neo4j☆103Updated 11 years ago
- bigram / trigram analysis of wikipedia; mainly mutual info☆22Updated 12 years ago
- ☆23Updated this week
- General Architecture for Text Engineering☆45Updated 8 years ago
- Cuts movie dialog summary video.☆10Updated 8 years ago
- ☆25Updated this week
- Narwhal is a keyword and KEY NARRATIVE manager that creates language-aware classes. Because Narhwal does not use NLP it avoids complexity…☆12Updated 5 years ago
- ☆20Updated 6 years ago
- Full text extraction using the Open Source Tesseract OCR software https://code.google.com/p/tesseract-ocr/ and imagemagick