fancyspeed/sf-extractor

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/fancyspeed/sf-extractor)

fancyspeed / sf-extractor

Html content extractor: cx-extractor in python and sf-extractor

☆18

Alternatives and similar repositories for sf-extractor

Users that are interested in sf-extractor are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

fancyspeed / semi-lda
View on GitHub
Semi-supervised Latent Dirichlet Allocation (LDA)
☆12Dec 21, 2017Updated 8 years ago
NJUST-FishTeam / OnlineJudgeSite_M6
View on GitHub
python写的分布式判题节点
☆18Jun 26, 2017Updated 9 years ago
wenjunxiao / python-autoreload
View on GitHub
An auto-reload module for python app.
☆11Nov 12, 2014Updated 11 years ago
rshk / MongoSQL
View on GitHub
JSON-based DSLs are not for humans..
☆10Sep 4, 2014Updated 11 years ago
tayebiarasteh / retweet
View on GitHub
How Will Your Tweet Be Received? Predicting theSentiment Polarity of Tweet Replies
☆12Aug 29, 2021Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
chrislinan / cx-extractor-python
View on GitHub
基于行块分布函数的通用网页正文抽取算法的Python版本实现，添加了英文支持/ Web page content extraction algorithm, support both Chinese and English
☆482Jul 9, 2019Updated 7 years ago
rachit-shah / News-Classfication-using-DNN-models
View on GitHub
Comparative Analysis of CNN, RNN and HAN for Text Classification with GloVe Data Model
☆11May 4, 2019Updated 7 years ago
Alfresco / alfresco-jodconverter
View on GitHub
JODConverter automates document conversions using LibreOffice/OpenOffice.org
☆12Jul 9, 2025Updated last year
zhaozhengcoder / Spider
View on GitHub
Python爬虫
☆13Feb 3, 2018Updated 8 years ago
hee0624 / process_image
View on GitHub
generate noise image 生成噪声图片，用来cv领域
☆14Feb 9, 2021Updated 5 years ago
reorx / readability
View on GitHub
html main body extractor
☆17Jul 15, 2015Updated 11 years ago
bylee5 / calcite-elasticsearch
View on GitHub
☆14Oct 5, 2022Updated 3 years ago
jasonsperske / FlashTextJava
View on GitHub
an idiomatic port of FlashText.py to Java using streams
☆14Sep 27, 2024Updated last year
NJdevPro / Closure-Table
View on GitHub
An implementation of the closure table pattern in Python + SQL
☆15Nov 13, 2022Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
chenhg5 / go-wechat
View on GitHub
golang 微信开发工具
☆10Jul 10, 2018Updated 8 years ago
hivefans / timeline_map
View on GitHub
带有时间轴的中国地图趋势kibana插件
☆15May 26, 2017Updated 9 years ago
nyov / scrapyext
View on GitHub
scrapy-extras -- a collection of code samples and modules for the Scrapy framework.
☆14Dec 14, 2020Updated 5 years ago
Angela7126 / pyrouge_for_windows
View on GitHub
reviese pyrouge files for supporting winxp win 8.1 win10
☆12Nov 21, 2017Updated 8 years ago
jackeyGao / csvSQL
View on GitHub
csvSQL 可以让你通过SQL来查看csv文件数据
☆11Aug 2, 2016Updated 9 years ago
code4conference / code4sc
View on GitHub
code for sentence compression
☆20Mar 3, 2018Updated 8 years ago
StrongBoy998 / CrawlArticle
View on GitHub
基于文字密度的新闻正文提取模块，兼容python2和python3，传入新闻网址或者网页源码即可返回标题，发布时间和正文内容。
☆14Jun 10, 2018Updated 8 years ago
skomarica / alfresco-share-create-link
View on GitHub
"Create Link" is a custom Alfresco Share Document Library action, similar to "Copy to...", but instead of copying, it creates a link to t…
☆17May 10, 2016Updated 10 years ago
pkumaster / philosophia
View on GitHub
☆12Feb 9, 2020Updated 6 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
chandrasekharan98 / Multisite-Python-Crawler
View on GitHub
An almost generic web crawler built using Scrapy and Python 3.7 to recursively crawl entire websites.
☆17Mar 1, 2022Updated 4 years ago
rmraya / TMEngine
View on GitHub
An open source Translation Memory Engine written in Java
☆16Dec 22, 2022Updated 3 years ago
2012060010010 / sent_compression
View on GitHub
句子压缩模型，用于去除句子不重要的部分，使得语法分析等更加精确。
☆17Jan 26, 2018Updated 8 years ago
brutuscat / medusa
View on GitHub
- THIS IS AN OLD FORK - Checkout Medusa Crawler gem instead "medusa-crawler"
☆16Aug 5, 2020Updated 5 years ago
fujimotos / TinyFastSS
View on GitHub
An index data structure for approximate string search.
☆23May 6, 2019Updated 7 years ago
julien-duponchelle / scrapy-graphite
View on GitHub
Output scrapy statistics to graphite/carbon
☆54Mar 9, 2013Updated 13 years ago
kingwkb / readability
View on GitHub
a python readability
☆277Jun 22, 2017Updated 9 years ago
throne-developer / gofunc
View on GitHub
A better go test tool
☆10Apr 15, 2020Updated 6 years ago
BruceDone / dagobah
View on GitHub
Simple DAG-based job scheduler in Python
☆13May 10, 2017Updated 9 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
GLiu203 / -SS-
View on GitHub
用搬瓦工搭梯子的教程——小白教程
☆13Oct 15, 2018Updated 7 years ago
amumu-dev / cx-extractor
View on GitHub
clone of https://code.google.com/p/cx-extractor
☆37Sep 26, 2013Updated 12 years ago
theBigDataDigest / Andrew-Ng-deeplearning-part-5-Course-notes-in-Chinese
View on GitHub
Andrew Ng-deeplearning-Course notes
☆17Feb 20, 2018Updated 8 years ago
ivbeg / qddate
View on GitHub
Quick and dirty date parsing Python library to parse HTML dates really fast
☆22Jul 5, 2026Updated 2 weeks ago
mavalliani / Semantic-Similarity-of-Sentences
View on GitHub
Methods used: Cosine Similarity with Glove, Smooth Inverse Frequency, Word Movers Difference, Sentence Embedding Models (Infersent and Go…
☆17Jan 22, 2021Updated 5 years ago
owentemple / TED-talks
View on GitHub
A natural language processing project to reveal linguistic features that predict a persuasive TED Talk. I webscraped every TED Talk trans…
☆20Feb 10, 2026Updated 5 months ago
maxprograms-com / RemoteTM
View on GitHub
Translation Memory Server
☆19Jun 26, 2026Updated 3 weeks ago