huggingface/chug

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/huggingface/chug)

huggingface / chug

Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.

☆163

Alternatives and similar repositories for chug

Users that are interested in chug are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

andravin / spio
View on GitHub
Experimental CUDA kernel framework unifying typed dimensions, NVRTC JIT specialization, and ML‑guided tuning.
☆46Feb 9, 2026Updated 5 months ago
jacobmarks / huggingface-fiftyone-converters
View on GitHub
Convert datasets from Hugging Face to FiftyOne for Visualization
☆11Mar 15, 2024Updated 2 years ago
huggingface / datatrove
View on GitHub
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆3,214Updated this week
nateraw / huggingface-datasets-converter
View on GitHub
Scripts to convert datasets from various sources to Hugging Face Datasets.
☆57Oct 26, 2022Updated 3 years ago
huggingface / pixparse
View on GitHub
Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data
☆24Jul 30, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
huggingface / optimum-quanto
View on GitHub
A pytorch quantization backend for optimum
☆1,046Updated this week
felixdittrich92 / docling-OCR-OnnxTR
View on GitHub
OnnxTR OCR plugin for Docling
☆21Jun 28, 2026Updated 3 weeks ago
huggingface / nanotron
View on GitHub
Minimalistic large language model 3D-parallelism training
☆2,755May 26, 2026Updated last month
crypdick / timm-lr-scheduler-explorer
View on GitHub
A dashboard for exploring timm learning rate schedulers
☆20Nov 22, 2024Updated last year
furkanbiten / idl_data
View on GitHub
OCR Annotations from Amazon Textract for Industry Documents Library
☆103Aug 20, 2022Updated 3 years ago
bfshi / scaling_on_scales
View on GitHub
When do we not need larger vision models?
☆420Feb 8, 2025Updated last year
huggingface / llm-swarm
View on GitHub
Manage scalable open LLM inference endpoints in Slurm clusters
☆289Jul 11, 2024Updated 2 years ago
NathanGodey / headless-lm
View on GitHub
Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…
☆29Apr 17, 2024Updated 2 years ago
AnswerDotAI / toolslm
View on GitHub
Tools to make language models a bit easier to use
☆67Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
LAION-AI / General-GPT
View on GitHub
☆65Oct 4, 2023Updated 2 years ago
lucidrains / resfit-pytorch
View on GitHub
Implementation of ResFit, Residual Off-Policy RL for Finetuning Behavior Cloning Policies
☆17Sep 29, 2025Updated 9 months ago
rosewang2008 / backtracing
View on GitHub
Backtracing: Retrieving the Cause of the Query, EACL 2024 Long Paper, Findings.
☆91Jul 21, 2024Updated 2 years ago
huggingface / datablations
View on GitHub
Scaling Data-Constrained Language Models
☆344Jun 28, 2025Updated last year
mlfoundations / dataset2metadata
View on GitHub
☆28Mar 21, 2024Updated 2 years ago
lucasb-eyer / lbtoolbox
View on GitHub
My personal toolbox for doing datascience (especially deep learning) in python.
☆18Mar 21, 2020Updated 6 years ago
LAION-AI / scaling-laws-for-comparison
View on GitHub
☆22May 12, 2026Updated 2 months ago
filipgdorm / eco-llm
View on GitHub
☆14Mar 20, 2026Updated 4 months ago
pytorch / tensordict
View on GitHub
TensorDict is a pytorch dedicated tensor container.
☆1,033Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
IlyasMoutawwakil / llm-perf-backend
View on GitHub
The backend behind the LLM-Perf Leaderboard
☆11May 5, 2024Updated 2 years ago
webdataset / webdataset
View on GitHub
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
☆3,145Feb 9, 2026Updated 5 months ago
hsouri / Battle-of-the-Backbones
View on GitHub
☆212Nov 2, 2023Updated 2 years ago
Zasder3 / open_clip_juwels
View on GitHub
An open source implementation of CLIP.
☆33Nov 7, 2022Updated 3 years ago
young-geng / mlxu
View on GitHub
Machine Learning eXperiment Utilities
☆48Jul 29, 2025Updated 11 months ago
google-research / big_vision
View on GitHub
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
☆3,494May 19, 2025Updated last year
pytorch / torchtitan
View on GitHub
A PyTorch native platform for training generative AI models
☆5,545Updated this week
rwightman / timme
View on GitHub
timm, evolved
☆60May 28, 2026Updated last month
nateraw / encoded-video
View on GitHub
Utilities for working with videos
☆13Jul 5, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
rishiraj / firerequests
View on GitHub
High-performance, asynchronous Python HTTP client library designed for faster file transfers using concurrency, semaphores, and fault-tol…
☆60May 12, 2025Updated last year
srush / annotated-mamba
View on GitHub
Annotated version of the Mamba paper
☆501Feb 27, 2024Updated 2 years ago
shoaibahmed / metadata_archaeology
View on GitHub
Official code for the paper: "Metadata Archaeology"
☆19May 10, 2023Updated 3 years ago
jakespringer / echo-embeddings
View on GitHub
☆168Apr 17, 2024Updated 2 years ago
feevos / tfcl
View on GitHub
Official repository for PTAViT3D and PTAViT3DCA models for field boundaries detection using S2 and/or S1 imagery.
☆41Sep 24, 2024Updated last year
facebookresearch / unibench
View on GitHub
Python Library to evaluate VLM models' robustness across diverse benchmarks
☆227Jun 30, 2026Updated 3 weeks ago
mwalmer-umd / vit_analysis
View on GitHub
☆35Jun 13, 2023Updated 3 years ago