rjurney/Collecting-Data

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/rjurney/Collecting-Data)

rjurney / Collecting-Data

This is a HOWTO for collecting data in Ruby and Python applications and sending it to S3 via Kafka.

☆31

Alternatives and similar repositories for Collecting-Data

Users that are interested in Collecting-Data are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

rjurney / enron-avro
View on GitHub
Code for creating and querying an Avro encoded repository of the UC Berkeley Enron email archive
☆19May 21, 2012Updated 14 years ago
alienrobotwizard / sounder
View on GitHub
A grouping of Apache Pig examples.
☆65Oct 13, 2020Updated 5 years ago
alienrobotwizard / varaha
View on GitHub
Machine learning and natural language processing with Apache Pig
☆53Dec 17, 2013Updated 12 years ago
etsy / cascading.jruby
View on GitHub
A JRuby DSL for Cascading
☆41Sep 23, 2015Updated 10 years ago
rjurney / github-explorer
View on GitHub
Recommender system for Github projects using the github archive data
☆17Jun 5, 2013Updated 13 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
kevinweil / pig.tmbundle
View on GitHub
Simple syntax highlighting for writing Pig scripts (http://hadoop.apache.org/pig) in Textmate.
☆35May 2, 2013Updated 13 years ago
cfergus / StormSankey
View on GitHub
A Sankey view of a running storm topology
☆15Oct 13, 2012Updated 13 years ago
ceteri / ceteri-mapred
View on GitHub
MapReduce examples
☆20Nov 18, 2011Updated 14 years ago
Banno / salat-avro
View on GitHub
Fast bi-directional Scala case class to Avro serialization
☆23Mar 18, 2016Updated 10 years ago
tristen / superman
View on GitHub
A code editor theme that looks like colors resembling superman?
☆20Oct 22, 2025Updated 9 months ago
sjl / dram
View on GitHub
Clojure templating that won't make you drink.
☆17Nov 16, 2012Updated 13 years ago
benhamner / BioResponse
View on GitHub
Benchmarks for Boehringer-Ingelheim's Predicting a Biological Response Competition, hosted by Kaggle
☆17Mar 16, 2012Updated 14 years ago
psychemedia / Twitter-Backchannel-Analysis
View on GitHub
Tools for analysing and visualising activity around Twitter backchannels
☆26Nov 10, 2012Updated 13 years ago
flaptor / indextank-py
View on GitHub
Python Client for the IndexTank API (v1)
☆22Dec 22, 2011Updated 14 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
big-data-research / in-memory-data-pipeline
View on GitHub
The code for the in memory data pipeline that was presented at Berlin Buzzwords 2015.
☆10Jun 1, 2015Updated 11 years ago
shilad / PyVowpal
View on GitHub
Python wrapper for the Vowpal Wabbit machine learning library.
☆52Jul 19, 2013Updated 13 years ago
bmpvieira / heroku-dat-template
View on GitHub
A simple Heroku app template for deploying Dat
☆16Jul 27, 2015Updated 11 years ago
phyous / twitter-geo-data
View on GitHub
A script for gathering large amounts of twitter geo data
☆38May 7, 2013Updated 13 years ago
Geal / proust
View on GitHub
single node kafka implementation
☆13Apr 27, 2018Updated 8 years ago
kevinweil / FileSetInputFormat
View on GitHub
A Hadoop input format for sending lists of files as keys to a mapper. Set the list of files, and an input split will be created per file…
☆16Apr 7, 2010Updated 16 years ago
khakieconomics / MSW
View on GitHub
Files for Modern Statistical Workflow workshop
☆10Jul 16, 2016Updated 10 years ago
jakevdp / pyDistances
View on GitHub
Work in progress for eventual contribution to scikit-learn
☆21Mar 21, 2013Updated 13 years ago
hanneshapke / fruitloop_angular
View on GitHub
Tutorial project for a PDXPython talk
☆12Aug 27, 2014Updated 11 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
zladovan / gradle-avrohugger-plugin
View on GitHub
Gradle plugin for generating scala case classes from apache avro schemas, datafiles and protocols
☆12May 11, 2025Updated last year
thesteady / test-r
View on GitHub
calling R from a Rails app
☆10Mar 17, 2016Updated 10 years ago
datawrangling / spatialanalytics
View on GitHub
Where 2.0 Workshop Code: Spatial Analysis of Tweets using Hadoop, Pig, Python & Mechanical Turk. Slides here: http://www.slideshare.net/…
☆134Mar 31, 2010Updated 16 years ago
emre / swarm-host
View on GitHub
Realtime redis dashboard for your redis setup.
☆30Aug 29, 2013Updated 12 years ago
davenverse / testcontainers-specs2
View on GitHub
Add Support For Testing with TestContainers with Specs2
☆16Jul 8, 2024Updated 2 years ago
mattb / pig-redis
View on GitHub
Redis bulk-loader for Apache Pig
☆40Apr 21, 2012Updated 14 years ago
chyikwei / bnp
View on GitHub
Bayesian nonparametric models for python
☆18Sep 11, 2018Updated 7 years ago
andrewdoss / algorithms_illuminated
View on GitHub
Python implementations and tests for the Algorithms Illuminated book series. Some test cases from the following repository: https://githu…
☆15Dec 30, 2020Updated 5 years ago
hadooparchitecturebook / SparkStreaming.Sessionization
View on GitHub
NRT Sessionization with Spark Streaming landing on HDFS and putting live stats in HBase
☆16Oct 31, 2014Updated 11 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
sksamuel / sbt-avro4s
View on GitHub
Sbt plugin for avro4s
☆20May 30, 2018Updated 8 years ago
mtarsel / Django-MOOC
View on GitHub
MOOC Project for software design and development taught at Clarkson University
☆16Feb 11, 2022Updated 4 years ago
swiftype / swiftype-py
View on GitHub
Swiftype Python Client
☆21Sep 5, 2019Updated 6 years ago
twitter-archive / grabby-hands
View on GitHub
A JVM Kestrel client that aggregates queues from multiple servers. Implemented in Scala with Java bindings. In use at Twitter for all JVM…
☆56Mar 16, 2017Updated 9 years ago
lmarlow / resque-result
View on GitHub
A resque plugin to fetch the result from a job's perform method
☆15Sep 11, 2010Updated 15 years ago
oedura / scavro
View on GitHub
An SBT plugin for automatically calling Avro code generation and a thin scala wrapper for reading and writing Avro files
☆22Mar 8, 2018Updated 8 years ago
robsimmons / abbot
View on GitHub
Generation of abstract binding trees
☆27Sep 26, 2025Updated 10 months ago