matpalm/common-crawl

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/matpalm/common-crawl)

matpalm / common-crawl

playing around with the common crawl dataset

☆70

Alternatives and similar repositories for common-crawl

Users that are interested in common-crawl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

infochimps-labs / wonderdog
View on GitHub
Bulk loading for elastic search
☆186Dec 16, 2023Updated 2 years ago
blakesmith / skeeter
View on GitHub
Non-blocking Goliath webservice to convert images to ascii
☆35Sep 18, 2011Updated 14 years ago
endpnt / andoc
View on GitHub
collaborative web tool to enrich content
☆11Nov 13, 2011Updated 14 years ago
andrewmcdonough / ruby-poetry
View on GitHub
☆25Feb 23, 2012Updated 14 years ago
whym / wikihadoop
View on GitHub
Stream-based InputFormat for processing the compressed XML dumps of Wikipedia with Hadoop
☆85Jun 8, 2013Updated 13 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
technoweenie / nubnub
View on GitHub
Node.js PubSubHubbub client/server implementation
☆144Nov 5, 2013Updated 12 years ago
cloudera / bigtop
View on GitHub
Bigtop is a project for the development of packaging and tests of the Apache Hadoop ecosystem. The primary goal of Bigtop is to build a …
☆51Jul 4, 2011Updated 15 years ago
javasoze / chirper
View on GitHub
distributed twitter search engine
☆78Jul 27, 2011Updated 14 years ago
utcompling / OpenNLP-Models
View on GitHub
A project for code to create models from existing corpora and distribute models.
☆42Apr 11, 2012Updated 14 years ago
cestella / presentations
View on GitHub
Public Presentations
☆24Apr 13, 2025Updated last year
gutefrage / aurora-redis
View on GitHub
☆12Jun 20, 2016Updated 10 years ago
nearform / cloudwatchlogs-stream
View on GitHub
Stream interfacet to CloudWatch Logs
☆11May 31, 2015Updated 11 years ago
jakevdp / pyLLE
View on GitHub
python wrapper of fast C++ LLE code
☆18May 18, 2011Updated 15 years ago
myralabs / pymyra
View on GitHub
Python library for Myra
☆10Jan 21, 2019Updated 7 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
jedp / redis-completer
View on GitHub
Real-time search with autocomplete via redis
☆45Jun 23, 2018Updated 8 years ago
luxzia / fraud_nlp
View on GitHub
☆11Jul 30, 2014Updated 11 years ago
tanakh / ICFP2011
View on GitHub
ICFP Programming Contest 2011 repository
☆24Jul 1, 2011Updated 15 years ago
crcn / emailify
View on GitHub
Make HTML pages email-safe
☆39Oct 8, 2015Updated 10 years ago
chriskite / rediscover
View on GitHub
Redis GUI
☆32Mar 10, 2010Updated 16 years ago
shilad / PyVowpal
View on GitHub
Python wrapper for the Vowpal Wabbit machine learning library.
☆52Jul 19, 2013Updated 12 years ago
MikeBishop / http-layering
View on GitHub
Start of an Internet draft on the separation between HTTP's semantic layer, framing layer(s), and the underlying transport layer.
☆15Mar 22, 2016Updated 10 years ago
mesos / spark
View on GitHub
Lightning-fast cluster computing in Java, Scala and Python.
☆1,419Apr 8, 2014Updated 12 years ago
nathanmarz / cascalog-contrib
View on GitHub
☆45Feb 16, 2013Updated 13 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
edgi-govdata-archiving / eis-WARC-archiver
View on GitHub
ARCHIVED--Docker app to crawl URLs and generate WARCs
☆10Apr 11, 2017Updated 9 years ago
rossta / seymour
View on GitHub
Activity feed audiences, backed by Redis.
☆22May 22, 2014Updated 12 years ago
jollygoodcode / emoji-keywords
View on GitHub
Emoji keywords to unicode mapping in easily consumable format
☆11Jun 9, 2016Updated 10 years ago
ahoernecke / docker_scumblr
View on GitHub
Docker Container for Scumblr (github.com/netflix/scumblr)
☆14Jul 13, 2016Updated 9 years ago
gittar / bkmeans
View on GitHub
The breathing k-means algorithm (just one source file containing the algorithm as found on pypi)
☆21Jul 10, 2024Updated 2 years ago
mortehu / substring-frequencies
View on GitHub
C++ program for finding strings that are over-represented in one of two texts
☆17Dec 25, 2017Updated 8 years ago
jcoleman / mail_alternatives_with_attachments
View on GitHub
Send multipart alternative emails with attachments from ActionMailer
☆20Apr 2, 2014Updated 12 years ago
alienrobotwizard / varaha
View on GitHub
Machine learning and natural language processing with Apache Pig
☆53Dec 17, 2013Updated 12 years ago
acrosa / Scala-TwitterStreamer
View on GitHub
Scala client for the Twitter streaming api
☆68Apr 18, 2011Updated 15 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
seratch / scalikesolr
View on GitHub
Apache Solr Client for Scala/Java
☆52Jan 11, 2016Updated 10 years ago
vatsan / gp_xgboost_gridsearch
View on GitHub
In-database parallel grid-search for XGBoost on Greenplum
☆15Mar 1, 2018Updated 8 years ago
andrzejkrzywda / madeleine
View on GitHub
Prevayler in Ruby
☆15May 24, 2011Updated 15 years ago
cfcosta / rack-analytics
View on GitHub
A rack middleware that collects access statistics and saves them on a MongoDB database. Not ready to production use.
☆17Jan 20, 2023Updated 3 years ago
mikesmullin / Chef-Solo-Capistrano-Bootstrap
View on GitHub
Utilize Capistrano to automatically bootstrap any remote server for Chef-Solo via SSH using a single command.
☆35Sep 28, 2010Updated 15 years ago
citizen428 / ClojureX
View on GitHub
An easy way to set up a full Clojure development environment on OS X
☆86Aug 20, 2011Updated 14 years ago
shayanjm / pasteye
View on GitHub
Pastebin Monitoring as a Service
☆73Feb 26, 2014Updated 12 years ago