rkrzr / dataset-popularView external linksLinks
A dataset of popular pages (taken from <dir.yahoo.com>) with manually marked up semantic blocks.
☆15Feb 9, 2014Updated 12 years ago
Alternatives and similar repositories for dataset-popular
Users that are interested in dataset-popular are comparing it to the libraries listed below
Sorting:
- Tools for web page segmentation. In development☆17Nov 7, 2018Updated 7 years ago
- Fito is a python library that helps to organize your data so you can access it in a more understandable and easy way☆10Feb 26, 2018Updated 7 years ago
- Web page segmentation and noise removal☆55Feb 4, 2024Updated 2 years ago
- Data science tools from Moz☆23Jan 11, 2017Updated 9 years ago
- A distributed in-memory fabric based on shared-memory blocks and datashape. Any language can operate on the data.☆13Feb 12, 2016Updated 10 years ago
- Age classification from text using PAN16, blogs, Fisher Callhome, and Cancer Forum☆18Jul 1, 2022Updated 3 years ago
- Classifies webpages into categories defined in DMOZ dataset☆40Dec 14, 2015Updated 10 years ago
- Agent fixing SWE bench issues☆19May 21, 2024Updated last year
- Scalable pattern search optimization with dask☆22Apr 12, 2017Updated 8 years ago
- Kaggle competition results☆20Jan 4, 2019Updated 7 years ago
- a series of trie testing things☆21Apr 9, 2017Updated 8 years ago
- Scrapy Eagle is a tool that allow us to run any Scrapy based project in a distributed fashion and monitor how it is going on and how many…☆24Sep 4, 2020Updated 5 years ago
- extract difference between two html pages☆32Feb 10, 2026Updated last week
- The Clever Algorithms project is an effort to describe a large number of algorithmic techniques from the field of Artificial Intelligence…☆29Oct 28, 2018Updated 7 years ago
- Participate in the 4th U.S. National Action Plan for Open Government☆13Jun 8, 2018Updated 7 years ago
- The missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!☆41May 29, 2017Updated 8 years ago
- openapi of all third-party☆10Updated this week
- Apache Spark based framework for analysis A/B experiments☆15Nov 3, 2024Updated last year
- ICEG: Thematic Working Groups☆12Jun 11, 2025Updated 8 months ago
- The Linked GTFS vocabulary☆39Mar 20, 2022Updated 3 years ago
- ☆12Sep 22, 2015Updated 10 years ago
- ☆10Jun 24, 2020Updated 5 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Dec 17, 2021Updated 4 years ago
- Structured Data Extractor. An application to extract structured data from web pages. It uses Data Extraction Based on Partial Tree Alignm…☆49Jun 9, 2012Updated 13 years ago
- Extract (DOM tree) repetitions from a webpage☆12Jan 13, 2014Updated 12 years ago
- A Django App for HTML GUI applications, with easy Python/JS interoperation. It is a porting version of Eel.☆22Jul 28, 2018Updated 7 years ago
- A starting Python-Flask web app template with accompanying guide☆12Jan 18, 2025Updated last year
- BlockCAT token sale smart contracts.☆11Oct 19, 2017Updated 8 years ago
- audiofile.cc☆16Jun 27, 2011Updated 14 years ago
- Web UI for labelling dataset for supervised learning.☆81Jun 7, 2021Updated 4 years ago
- Application for checking performance of elevator group system in building using simulation method.☆12Nov 9, 2017Updated 8 years ago
- A sandbox for opensource demonstrations of GitHub☆14Apr 13, 2016Updated 9 years ago
- Faster replacement for Python's urlparse module☆45Sep 30, 2018Updated 7 years ago
- ☆10Feb 8, 2021Updated 5 years ago
- ☆12Jan 31, 2015Updated 11 years ago
- A database with automatic dynamic imputation of missing values.☆11Nov 2, 2017Updated 8 years ago
- Provides for deploying custom ETL containers on AIStore, with subsequent user-defined extraction-transformation-loading in parallel, on t…☆19Nov 26, 2025Updated 2 months ago
- Replication package of the ICSE2025 paper titled "Leveraging Large Language Models for Enhancing the Understandability of Generated Unit …☆11Feb 19, 2025Updated 11 months ago
- A Twisted-based Kubernetes client.☆12Dec 18, 2018Updated 7 years ago