linkedin/Avro2TF

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/linkedin/Avro2TF)

linkedin / Avro2TF

Avro2TF is designed to fill the gap of making users' training data ready to be consumed by deep learning training frameworks.

☆129

Alternatives and similar repositories for Avro2TF

Users that are interested in Avro2TF are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

linkedin / photon-ml
View on GitHub
A scalable machine learning library on Apache Spark
☆797Aug 30, 2021Updated 4 years ago
tony-framework / TonY
View on GitHub
TonY is a framework to natively run deep learning frameworks on Apache Hadoop.
☆708Oct 14, 2023Updated 2 years ago
linkedin / dynamometer
View on GitHub
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
☆135Jan 11, 2024Updated 2 years ago
linkedin / spark
View on GitHub
Apache Spark - A unified analytics engine for large-scale data processing
☆16Jul 24, 2023Updated 2 years ago
hortonworks-spark / spark-hive-streaming-sink
View on GitHub
A sink to save Spark Structured Streaming DataFrame into Hive table
☆23May 7, 2018Updated 8 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
linkedin / spark-tfrecord
View on GitHub
Read and write Tensorflow TFRecord data from Apache Spark.
☆300Apr 22, 2024Updated 2 years ago
paypal / gimel
View on GitHub
Big Data Processing Framework - Unified Data API or SQL on Any Storage
☆252Jul 10, 2025Updated last year
jdye64 / docker-hwx
View on GitHub
Combination of Dockerized Hortonworks projects and other Hadoop ecosystem components
☆10Oct 11, 2019Updated 6 years ago
amient / affinity
View on GitHub
Library and a Framework for building fast, scalable, fault-tolerant Data APIs based on Akka, Avro, ZooKeeper and Kafka
☆25Oct 16, 2020Updated 5 years ago
streamthoughts / kafka-connect-transform-grok
View on GitHub
Grok Expression Transform for Kafka Connect.
☆16Jun 26, 2026Updated 3 weeks ago
ray-project / distml
View on GitHub
Distributed ML Optimizer
☆35Jul 28, 2021Updated 4 years ago
zhisbug / ray-scalable-ml-design
View on GitHub
Some microbenchmarks and design docs before commencement
☆11Feb 1, 2021Updated 5 years ago
zio-archive / interop-java
View on GitHub
☆17Feb 16, 2020Updated 6 years ago
paypal / NNAnalytics
View on GitHub
NameNodeAnalytics is a self-help utility for scouting and maintaining the namespace of an HDFS instance.
☆121Nov 25, 2025Updated 7 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
deniederhut / safe-handling-instructions-for-missing-data
View on GitHub
Code and data for SciPy 2018 talk on missing data
☆21Jun 29, 2018Updated 8 years ago
linkedin / transport
View on GitHub
A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Ap…
☆306Jun 29, 2026Updated 3 weeks ago
chiragjn / short-text-similarity
View on GitHub
Short Text Similarity as described in https://dl.acm.org/citation.cfm?id=2806475
☆17Feb 7, 2019Updated 7 years ago
linkedin / isolation-forest
View on GitHub
A distributed Spark/Scala implementation of the isolation forest and extended isolation forest algorithms for unsupervised outlier detect…
☆260Jun 12, 2026Updated last month
ray-project / ray_shuffling_data_loader
View on GitHub
A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed tra…
☆18Jan 5, 2023Updated 3 years ago
linkedin / brooklin
View on GitHub
An extensible distributed system for reliable nearline data streaming at scale
☆965Updated this week
linkedin / linkedin-gradle-plugin-for-apache-hadoop
View on GitHub
☆118May 11, 2023Updated 3 years ago
tensorflow / ecosystem
View on GitHub
Integration of TensorFlow with other open-source frameworks
☆1,378Sep 25, 2024Updated last year
databricks-industry-solutions / json2spark-schema
View on GitHub
Converting a json schema to a spark schema (struct) representation
☆14Mar 18, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
adobe / koperator
View on GitHub
Oh no! Yet another Kafka operator for Kubernetes
☆21Updated this week
51zero / eel-sdk
View on GitHub
Big Data Toolkit for the JVM
☆147Nov 4, 2020Updated 5 years ago
microsoft / vscode-jupyter-hub
View on GitHub
Jupyter Hub Support in VS Code
☆17Jul 13, 2026Updated last week
combust / mleap
View on GitHub
MLeap: Deploy ML Pipelines to Production
☆1,539Jul 10, 2026Updated last week
ottogroup / schedoscope
View on GitHub
Schedoscope is a scheduling framework for painfree agile development, testing, (re)loading, and monitoring of your datahub, lake, or what…
☆98Nov 14, 2019Updated 6 years ago
linkedin / dr-elephant
View on GitHub
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
☆1,370Aug 22, 2023Updated 2 years ago
apache / yunikorn-core
View on GitHub
Apache YuniKorn Core
☆1,021Updated this week
charsyam / textbox
View on GitHub
Get text from documents format
☆29Nov 22, 2017Updated 8 years ago
apache / incubator-retired-horn
View on GitHub
Mirror of Apache Horn (Incubating) ** This project has been retired **
☆28Apr 28, 2017Updated 9 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
jeremyrsmith / baudrillard
View on GitHub
Experiments with symbolic functions in the Scala type system
☆27Jun 17, 2019Updated 7 years ago
go-kafka / connect
View on GitHub
CLI tool and Go client library for the Kafka Connect REST API
☆52Nov 30, 2020Updated 5 years ago
tensorflow / tensorboard-plugin-example
View on GitHub
☆135Aug 9, 2019Updated 6 years ago
Cray / lustre
View on GitHub
Cray Lustre is HPE's curated Lustre distro for HPE ClusterStor, Cray EX, and other HPE/Cray clients
☆18Updated this week
sriksun / Ivory
View on GitHub
Data Management + Feed Processing Platform over Hadoop
☆27May 8, 2013Updated 13 years ago
desmondyeung / scala-hashing
View on GitHub
Fast non-cryptographic hash functions for Scala
☆74Aug 26, 2019Updated 6 years ago
stanford-futuredata / Willump
View on GitHub
Willump Is a Low-Latency Useful Machine learning Platform.
☆45Mar 24, 2023Updated 3 years ago