huggingface / chug
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
☆156Updated last year
Alternatives and similar repositories for chug:
Users that are interested in chug are comparing it to the libraries listed below
- M4 experiment logbook☆57Updated last year
- Manage scalable open LLM inference endpoints in Slurm clusters☆253Updated 8 months ago
- Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M d…☆199Updated 7 months ago
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆197Updated this week
- Load compute kernels from the Hub☆107Updated last week
- Multimodal language model benchmark, featuring challenging examples☆162Updated 3 months ago
- LL3M: Large Language and Multi-Modal Model in Jax☆71Updated 11 months ago
- ☆64Updated last year
- Set of scripts to finetune LLMs☆37Updated last year
- Language models scale reliably with over-training and on downstream tasks☆96Updated last year
- ☆58Updated last year
- Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizers☆88Updated 8 months ago
- code for training & evaluating Contextual Document Embedding models☆176Updated 2 months ago
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆90Updated 3 months ago
- Scalable and Performant Data Loading☆231Updated this week
- The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"☆242Updated 2 months ago
- ☆63Updated 6 months ago
- ☆302Updated 9 months ago
- Multipack distributed sampler for fast padding-free training of LLMs☆186Updated 7 months ago
- Scaling Data-Constrained Language Models☆335Updated 6 months ago
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind☆174Updated 6 months ago
- ☆76Updated 8 months ago
- ☆163Updated last month
- Let's build better datasets, together!☆257Updated 3 months ago
- ☆74Updated 6 months ago
- PyTorch code for hierarchical k-means -- a data curation method for self-supervised learning☆149Updated 9 months ago
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆127Updated 3 months ago
- ☆120Updated 5 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆123Updated 11 months ago
- experiments with inference on llama☆104Updated 9 months ago