Mimino666/python-xextract

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Mimino666/python-xextract)

Mimino666 / python-xextract

Extract structured data from HTML and XML documents like a boss.

☆51

Alternatives and similar repositories for python-xextract

Users that are interested in python-xextract are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AutoToolkit / titans
View on GitHub
Selenium automation framework - Selenium自动化框架
☆19Apr 23, 2023Updated 3 years ago
vhyza / lemmagen-lexicons
View on GitHub
Language lexicons for elasticsearch https://github.com/vhyza/elasticsearch-analysis-lemmagen plugin
☆15Dec 11, 2018Updated 7 years ago
Jaymon / captain
View on GitHub
command line python scripts for humans
☆13Feb 20, 2026Updated 5 months ago
asanoja / segmentations
View on GitHub
Tools for web page segmentation. In development
☆17Nov 7, 2018Updated 7 years ago
AaronJny / scrapy_redis_expiredupefilter
View on GitHub
scrapy-redis-expiredupefilter是基于scrapy-redis修改来的一款scrapy分布式爬虫框架，它支持为请求指纹设置生命周期，请求指纹生命周期结束后将在不影响其他指纹的情况下自动清除。
☆10Aug 6, 2019Updated 6 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
frywang / DataMining
View on GitHub
对dbpedia和百科采集而来的语料进行清洗，得到合适的三元组
☆15Jun 24, 2017Updated 9 years ago
timbertson / unfluff
View on GitHub
[abandoned] statistical HTML content extraction in python
☆18Jan 12, 2011Updated 15 years ago
jessecoleman / gbtl-python-bindings
View on GitHub
☆11Apr 24, 2018Updated 8 years ago
robjohncox / python-html-assert
View on GitHub
Utility for asserting the structure and content of HTML in python.
☆24May 4, 2020Updated 6 years ago
rkrzr / dataset-popular
View on GitHub
A dataset of popular pages (taken from <dir.yahoo.com>) with manually marked up semantic blocks.
☆15Feb 9, 2014Updated 12 years ago
matthewryanscott / virtualenv-pythonw-osx
View on GitHub
Install a working 'pythonw' into a virtualenv on Mac OS X
☆51Jul 22, 2018Updated 7 years ago
Python3WebSpider / ElasticSearchTest
View on GitHub
Elastic Search Code
☆23Aug 29, 2021Updated 4 years ago
brianium / yoose
View on GitHub
A Clojure library for use case driven development
☆11Dec 25, 2017Updated 8 years ago
smalldirector / solr-multilingual-analyzer
View on GitHub
A new solr multilingual index and search architecture, it can support index and search across multiple languages at the same time in the …
☆13Oct 18, 2019Updated 6 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
turicas / templater
View on GitHub
Extract, parse and populate templates from strings
☆28Apr 4, 2019Updated 7 years ago
FloodRunner / FloodRunner
View on GitHub
An open-source framework that allows you to easily monitor your web applications using end-end browser tests.
☆15Apr 17, 2021Updated 5 years ago
shangxiao / bargeparse
View on GitHub
Instrospect function signatures to construct a CLI
☆16Apr 16, 2021Updated 5 years ago
pydepta / pydepta
View on GitHub
A python implementation of DEPTA
☆84Jan 14, 2017Updated 9 years ago
pidlug / recollfs
View on GitHub
RecollFs - FUSE filesystem using Recoll index, showing filtered files in directories.
☆16Aug 9, 2014Updated 11 years ago
jsooter / RichFilemanager-Python3Flask
View on GitHub
Python3 & Flask connector for Rich Filemanager
☆16Apr 30, 2018Updated 8 years ago
honzajavorek / fiobank
View on GitHub
Fio Bank API in Python
☆42Jun 19, 2026Updated last month
kmbn / sunny-crm
View on GitHub
A multi-user web-based CRM for freelancers with an emphasis on flow and momentum
☆20Feb 21, 2017Updated 9 years ago
ahal / jetty
View on GitHub
Python dependency management via Poetry
☆16Nov 28, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ydf0509 / nb_http_client
View on GitHub
pip install nb_http_client ，nb_http_client 是 python 史上性能最强的http客户端，比任意请求包快很多倍
☆36May 28, 2024Updated 2 years ago
lopuhin / scrapy-pyppeteer
View on GitHub
Use pyppeteer from a Scrapy spider
☆59Feb 5, 2020Updated 6 years ago
WanderingLemon / saswatch
View on GitHub
A quick, simple, random color generation tool, written in Rust!
☆13Oct 31, 2025Updated 8 months ago
onepercentclub / bluebottle
View on GitHub
Bluebottle
☆11Updated this week
Melcus / google-like-search
View on GitHub
Google like search with laravel, vue js and elasticsearch
☆13Mar 20, 2017Updated 9 years ago
eladnava / batch-reply-for-gmail
View on GitHub
A chrome extension that makes it possible to reply to all selected conversations in Gmail™ at once.
☆12Dec 16, 2024Updated last year
encukou / czech-sort
View on GitHub
Python tool for simple Czech alphabetization
☆14Jul 12, 2023Updated 3 years ago
rafaelcapucho / scrapy-eagle
View on GitHub
Scrapy Eagle is a tool that allow us to run any Scrapy based project in a distributed fashion and monitor how it is going on and how many…
☆24Sep 4, 2020Updated 5 years ago
PSeitz / rust_measure_time
View on GitHub
measures and prints wall time in rust for given scope
☆19Dec 9, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
tballison / lucene-addons
View on GitHub
Standalone versions of LUCENE_5205 and other patches: SpanQueryParser, Concordance and Co-occurrence stats
☆18Aug 2, 2021Updated 4 years ago
feincms / django-sitemaps
View on GitHub
sitemap.xml generation using lxml with support for alternates.
☆13Updated this week
Bayer-Group / docker-min-jessie
View on GitHub
Minimally sized Debian Jessie build
☆17Sep 30, 2015Updated 10 years ago
bastienlabelle / tornado-base-app
View on GitHub
Tornado base application
☆16Jan 12, 2010Updated 16 years ago
Python3WebSpider / ScrapyPyppeteerDeprecated
View on GitHub
Scrapy Pyppeteer Demo
☆24Jul 13, 2018Updated 8 years ago
starenka / pandas_djmodel
View on GitHub
Generates Django model definition from Pandas DataFrame
☆17May 25, 2018Updated 8 years ago
CarloMicieli / hascalator
View on GitHub
Reimplementing the Haskell prelude in Scala (for fun)
☆13Jul 6, 2019Updated 7 years ago