kahliloppenheimer / Web-page-classificationView external linksLinks
Classifies webpages into categories defined in DMOZ dataset
☆40Dec 14, 2015Updated 10 years ago
Alternatives and similar repositories for Web-page-classification
Users that are interested in Web-page-classification are comparing it to the libraries listed below
Sorting:
- A dataset of popular pages (taken from <dir.yahoo.com>) with manually marked up semantic blocks.☆15Feb 9, 2014Updated 12 years ago
- Fito is a python library that helps to organize your data so you can access it in a more understandable and easy way☆10Feb 26, 2018Updated 7 years ago
- Failover AWS Spot Instances☆11Dec 8, 2017Updated 8 years ago
- A distributed in-memory fabric based on shared-memory blocks and datashape. Any language can operate on the data.☆13Feb 12, 2016Updated 10 years ago
- Data science tools from Moz☆23Jan 11, 2017Updated 9 years ago
- Age classification from text using PAN16, blogs, Fisher Callhome, and Cancer Forum☆18Jul 1, 2022Updated 3 years ago
- Preparing DMOZ dataset for my n-Gram LM-based URL classification research☆31Aug 30, 2014Updated 11 years ago
- Tools for web page segmentation. In development☆17Nov 7, 2018Updated 7 years ago
- Dmoz RDF parser☆28Jun 22, 2016Updated 9 years ago
- Scalable pattern search optimization with dask☆22Apr 12, 2017Updated 8 years ago
- Kaggle competition results☆20Jan 4, 2019Updated 7 years ago
- Repository for the CLiPS HAte speech DEtection System [HADES].☆24Apr 5, 2018Updated 7 years ago
- a series of trie testing things☆21Apr 9, 2017Updated 8 years ago
- Scrapy Eagle is a tool that allow us to run any Scrapy based project in a distributed fashion and monitor how it is going on and how many…☆24Sep 4, 2020Updated 5 years ago
- The Clever Algorithms project is an effort to describe a large number of algorithmic techniques from the field of Artificial Intelligence…☆29Oct 28, 2018Updated 7 years ago
- The missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!☆41May 29, 2017Updated 8 years ago
- 🌩️ The Deep Learning framework based on Lightning☆11Dec 11, 2025Updated 2 months ago
- BlockCAT token sale smart contracts.☆11Oct 19, 2017Updated 8 years ago
- Application for checking performance of elevator group system in building using simulation method.☆12Nov 9, 2017Updated 8 years ago
- An attempt at creating a gold standard dataset for backtesting yesterday & today's content-extractors☆35Mar 19, 2015Updated 10 years ago
- A Go library for specialized integer hash maps.☆11Sep 15, 2016Updated 9 years ago
- A Strapi 4 plugin to generate content from Github public repositories☆14May 24, 2023Updated 2 years ago
- Extract (DOM tree) repetitions from a webpage☆12Jan 13, 2014Updated 12 years ago
- A project to attempt to automatically login to a website given a single seed☆128Feb 10, 2026Updated last week
- Opensource lib to enable cache of Go functions, using Groupcache, Redis, Memcache etc.☆10May 23, 2016Updated 9 years ago
- Provides for deploying custom ETL containers on AIStore, with subsequent user-defined extraction-transformation-loading in parallel, on t…☆19Nov 26, 2025Updated 2 months ago
- Library to extract text from HTML files☆11Dec 20, 2015Updated 10 years ago
- Benchmark of common hash functions☆10Sep 15, 2019Updated 6 years ago
- large-memory key-value pair store for Python☆50May 26, 2013Updated 12 years ago
- ☆12Apr 17, 2019Updated 6 years ago
- Provides syntax highlighting for Apptainer/Singularity definition files.☆10Dec 24, 2025Updated last month
- Elevator is an open source, on-disk key-value store. Provides high-performance bulk read-write operations over very large datasets while …☆70May 14, 2014Updated 11 years ago
- A simple desktop Wiki engine built around Markdown and git☆17Nov 15, 2021Updated 4 years ago
- Implementation of W3C's R2RML and Direct Mapping specifications☆10Oct 12, 2020Updated 5 years ago
- Experiments to benchmark implementations of a concurrent counter.☆13Jun 10, 2015Updated 10 years ago
- A c++ implementation of the Two-Pass Pairing Heap data structure.☆11Oct 9, 2016Updated 9 years ago
- A native Go clean room implementation of the Porter Stemming algorithm.☆14Apr 7, 2020Updated 5 years ago
- 🤖 Wikibase queries and edits made easy☆11Feb 9, 2020Updated 6 years ago
- pymur is a Python interface to The Lemur Toolkit.☆19Sep 17, 2018Updated 7 years ago