thibauts / duckduckgo
Simple duckduckgo results scraping
☆67Updated 7 years ago
Alternatives and similar repositories for duckduckgo:
Users that are interested in duckduckgo are comparing it to the libraries listed below
- A component that tries to avoid downloading duplicate content☆27Updated 6 years ago
- Spell correct entire sentences using nltk freqdist and symspell☆19Updated 7 years ago
- Show summary of a large number of URLs in a Jupyter Notebook☆17Updated 3 years ago
- Streaming web crawler with WebSocket API☆44Updated last year
- Find which links on a web page are pagination links☆29Updated 8 years ago
- Reduction is a python script which automatically summarizes a text by extracting the sentences which are deemed to be most important.☆55Updated 10 years ago
- extract difference between two html pages☆32Updated 6 years ago
- A spell-checker extending Peter Norvig's with multi-typo correction, hamming distance weighting, and more.☆98Updated 4 years ago
- Site Hound (previously THH) is a Domain Discovery Tool☆23Updated 3 years ago
- ☆24Updated 6 years ago
- Complete Mechanical Turk API written in Python that uses the same names as the official documentation☆46Updated 8 years ago
- A Python module to fetch and parse results from different search engines.☆77Updated 6 years ago
- An index data structure for approximate string search.☆23Updated 5 years ago
- Download scripts for distributing twitter data.☆62Updated 2 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 3 years ago
- Word Graph utility built with NLTK and TextBlob☆18Updated 11 years ago
- A distributed system for mining common crawl using SQS, AWS-EC2 and S3☆18Updated 10 years ago
- Build intelligent data-driven applications with minimal effort. Sentence Clustering, Topics Extraction, Text Similarity, Opinion Summariz…☆40Updated 5 years ago
- Small set of utilities to simplify writing Scrapy spiders.☆49Updated 9 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆56Updated last year
- Dataset and code for three Web crawling-related papers from SIGIR-2019, NeurIPS-2019. and ICML-2020.☆39Updated 2 months ago
- This is a REST Server endpoint built using Flask and Python.☆24Updated 2 years ago
- Detect and classify pagination links☆15Updated 4 years ago
- Virtual patent marking crawler at iproduct.epfl.ch☆14Updated 7 years ago
- Wikipedia API wrapper for humans and elk. (en.wikipedia.org/w/api.php, get it?)☆36Updated 10 years ago
- MITIE: library and tools for information extraction☆29Updated 10 years ago
- A compound word splitter for Python☆48Updated 3 years ago
- Keyword Extraction system using Brown Clustering - (This version is trained to extract keywords from job listings)☆17Updated 10 years ago
- Slides to learn a little natural language processing (NLP) with Python. Written in reST with S5/Docutils.☆28Updated 12 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago