WebHunger is an extensible, full-scale crawler framework that supports distributed crawling, aiming at getting users focused on web page parsing without concerning for the crawling process.
☆18Apr 11, 2018Updated 8 years ago
Alternatives and similar repositories for webhunger
Users that are interested in webhunger are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- spider of doubanbook☆10Jun 21, 2017Updated 8 years ago
- 分布式爬虫框架,基于webdrvier模拟用户请求,kafka消息传递,分布式网页存储使用hbase,task异步任务多线程解析,提供基础服务如:proxy ip服务和号码验证服务等, proxy page使用H5和we版进行接入☆13Dec 18, 2015Updated 10 years ago
- A (massive) DNS tools (reverse lookup for now)☆12Jul 6, 2022Updated 3 years ago
- Just a DEMO to demonstrate how to use JNA to type chars into alipay's password edit control automatically.☆12Dec 21, 2017Updated 8 years ago
- Sample AWS Batch project to read CSV files☆11Oct 22, 2017Updated 8 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- 简单状态机实现。同时以简化的订单状态机为例子进行了说明。☆16Oct 13, 2020Updated 5 years ago
- Orchestration, Management and Monitoring of Data Processing☆11Apr 25, 2026Updated 2 weeks ago
- Automatic Text Summarization with Machine Learning☆15Jul 30, 2017Updated 8 years ago
- Anything we need to maintain the Linked Open Data (LOD) publication of CEUR-WS.org☆16Jun 10, 2020Updated 5 years ago
- Predict the Race of a Given Surname Using Census Data☆13Jul 5, 2023Updated 2 years ago
- ☆19Jan 11, 2023Updated 3 years ago
- NYC Data Science Academy capstone project - build event driven financial model using deep learning artificial neural network.☆15Mar 27, 2017Updated 9 years ago
- Scaffold out a boilerplate for creating a browser extension with up-to-date tools and autoreload☆16Jan 27, 2021Updated 5 years ago
- ner using crf++☆10Mar 24, 2015Updated 11 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Java SDK for the TextRazor Text Analytics API☆16Mar 2, 2026Updated 2 months ago
- 爬虫抓取框架,封装HttpClient,Htmlunit,Selenium等工具☆26Nov 15, 2018Updated 7 years ago
- Be notified of recent events in the news by setting up alerts. Program uses NLP techniques such as keyword matching, k-clustering and sem…☆11Jun 27, 2016Updated 9 years ago
- The distributed statistical machine translation infrastructure consisting of load balancing, text pre/post-processing and translation ser…☆12Nov 29, 2018Updated 7 years ago
- ☆12Jun 7, 2019Updated 6 years ago
- JavaAgent内存马实现、检测、修复demo☆11Dec 7, 2022Updated 3 years ago
- Code Mate 代码片段管理器☆14Jun 7, 2017Updated 8 years ago
- A toolkit for generating paraphrase vector representations for words in context☆23May 19, 2015Updated 10 years ago
- A compendium of data projects and associated blog posts☆10Nov 4, 2019Updated 6 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- black Ip lists, dorks-collection☆17May 1, 2026Updated last week
- 数据平台(DataPlateform),最初的设计想法是:当今大数据横行,我们也不能落后。所以就想着写一个这样的平台系统。此项目集爬虫、搜索、Hadoop、Dwr推送、Quartz定时任务于一体的平台,其目的是想通过抓取互联网数据,通过大数据推测人或者某一事物的下一行为。C…☆18Jul 31, 2017Updated 8 years ago
- Grawlox is a profanity filter which offers methods for detecting and replacing swearwords.☆11Jul 16, 2017Updated 8 years ago
- DJIA index prices of 10 years and NYtimes news articles headline has been used to predict the DJIA index prices☆18Feb 21, 2018Updated 8 years ago
- Apache NiFi NLP Processor☆18Nov 8, 2023Updated 2 years ago
- Google 在 2018 年下旬开源了一款新的 Java 工具 Jib,可以轻松地将 Java 应用程序容器化。通过 Jib,我们不需要编写 Dockerfile 或安装 Docker,通过集成到 Maven 或 Gradle 插件,就可以立即将 Java 应用程序 容器化…☆21Apr 7, 2019Updated 7 years ago
- ADEL is a robust and efficient entity linking framework that is adaptive to text genres and language, entity types for the classification…☆19Jan 8, 2020Updated 6 years ago
- IOC, AOP, REST...☆14Apr 4, 2017Updated 9 years ago
- Generating spiders dynamically to crawl and check those free proxy ip on the internet with scrapy.☆43Oct 6, 2018Updated 7 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Event Time Extraction with a Decision Tree of Neural Classifiers☆18Feb 28, 2019Updated 7 years ago
- Simple DAG-based job scheduler in Python☆13May 10, 2017Updated 8 years ago
- Takes a context-free grammar and converts it into a decision-making graph. Can produce interactive Guides which generate valid sentences …☆22Jul 7, 2017Updated 8 years ago
- Code for Keith et al., EMNLP-2017 "Identifying civilians killed by police with distantly supervised entity-event extraction."☆15Jul 5, 2022Updated 3 years ago
- Lightweight method based on shortest path on word graphs and NLP to generate single sentence summaries that highly relevant and grammatic…☆19Jan 29, 2017Updated 9 years ago
- ☆21May 31, 2018Updated 7 years ago
- ☆11May 25, 2024Updated last year