playing around with the common crawl dataset
☆70Aug 18, 2012Updated 13 years ago
Alternatives and similar repositories for common-crawl
Users that are interested in common-crawl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Bulk loading for elastic search☆187Dec 16, 2023Updated 2 years ago
- A common set of compute primitives for PyCUDA and PyOpenCL☆59Feb 21, 2026Updated 3 months ago
- Non-blocking Goliath webservice to convert images to ascii☆35Sep 18, 2011Updated 14 years ago
- collaborative web tool to enrich content☆12Nov 13, 2011Updated 14 years ago
- ☆25Feb 23, 2012Updated 14 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Stream-based InputFormat for processing the compressed XML dumps of Wikipedia with Hadoop☆85Jun 8, 2013Updated 12 years ago
- scikit-learn: machine learning in Python☆13Mar 14, 2025Updated last year
- Node.js PubSubHubbub client/server implementation☆144Nov 5, 2013Updated 12 years ago
- distributed twitter search engine☆78Jul 27, 2011Updated 14 years ago
- A project for code to create models from existing corpora and distribute models.☆42Apr 11, 2012Updated 14 years ago
- python wrapper of fast C++ LLE code☆18May 18, 2011Updated 15 years ago
- Parse various network packets using nom☆15Dec 26, 2021Updated 4 years ago
- ☆11Jul 30, 2014Updated 11 years ago
- Applications of CloudHaskell to distributed computing, especially MapReduce☆21Mar 23, 2018Updated 8 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆16Apr 23, 2021Updated 5 years ago
- Redis GUI☆32Mar 10, 2010Updated 16 years ago
- ☆45Feb 16, 2013Updated 13 years ago
- A set of examples and utilities for using Pig with Cassandra. For the latest jar release, check the Downloads link.☆84Aug 21, 2014Updated 11 years ago
- ARCHIVED--Docker app to crawl URLs and generate WARCs☆10Apr 11, 2017Updated 9 years ago
- Activity feed audiences, backed by Redis.☆22May 22, 2014Updated 12 years ago
- Emoji keywords to unicode mapping in easily consumable format☆11Jun 9, 2016Updated 9 years ago
- The breathing k-means algorithm (just one source file containing the algorithm as found on pypi)☆21Jul 10, 2024Updated last year
- Exhibit is a simple gem to generate and work with presenters in Rails 3.☆15Jul 18, 2012Updated 13 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- C++ program for finding strings that are over-represented in one of two texts☆17Dec 25, 2017Updated 8 years ago
- Send multipart alternative emails with attachments from ActionMailer☆20Apr 2, 2014Updated 12 years ago
- Machine learning and natural language processing with Apache Pig☆53Dec 17, 2013Updated 12 years ago
- ☆14Apr 6, 2014Updated 12 years ago
- Turn NGiNX into an adept HTTP push server.☆22May 14, 2011Updated 15 years ago
- A small test for recognizing persons with a word2vec model in German☆13Mar 15, 2015Updated 11 years ago
- Prevayler in Ruby☆15May 24, 2011Updated 15 years ago
- A rack middleware that collects access statistics and saves them on a MongoDB database. Not ready to production use.☆17Jan 20, 2023Updated 3 years ago
- CyberSource Secure Acceptance SOP Javascript Implementation☆13Jan 3, 2014Updated 12 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- DKPro WSD: A Java framework for word sense disambiguation☆21Nov 16, 2022Updated 3 years ago
- ☆34May 13, 2016Updated 10 years ago
- MPC Server for PySpark inpired by the LakeSail☆18Feb 26, 2026Updated 3 months ago
- Tools for working with wikidata (structured data from wikipedia)☆13Apr 26, 2016Updated 10 years ago
- (deprecated) Please use new nlp4l instead.☆65Sep 22, 2016Updated 9 years ago
- Chrome Extension to add sidebar of Tweets to Youtube.☆15Jul 3, 2015Updated 10 years ago
- Term payloads for Elasticsearch☆11Sep 8, 2016Updated 9 years ago