huggingface / chug
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
☆155Updated 9 months ago
Alternatives and similar repositories for chug:
Users that are interested in chug are comparing it to the libraries listed below
- M4 experiment logbook☆56Updated last year
- Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M d…☆193Updated 5 months ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆249Updated 6 months ago
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆184Updated last month
- LL3M: Large Language and Multi-Modal Model in Jax☆68Updated 9 months ago
- The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"☆238Updated last week
- ☆58Updated 10 months ago
- Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizers☆78Updated 6 months ago
- Multimodal language model benchmark, featuring challenging examples☆158Updated last month
- This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.☆59Updated 4 months ago
- PyTorch code for hierarchical k-means -- a data curation method for self-supervised learning☆141Updated 7 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆224Updated last month
- ☆138Updated 9 months ago
- ☆62Updated 4 months ago
- Multipack distributed sampler for fast padding-free training of LLMs☆184Updated 5 months ago
- ☆121Updated this week
- Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.☆169Updated this week
- Code for Zero-Shot Tokenizer Transfer☆121Updated 2 weeks ago
- experiments with inference on llama☆104Updated 7 months ago
- Scalable and Performant Data Loading☆211Updated this week
- Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…☆119Updated 6 months ago
- Supercharge huggingface transformers with model parallelism.☆76Updated 3 months ago
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind☆173Updated 4 months ago
- ☆296Updated 7 months ago
- Let's build better datasets, together!☆250Updated last month
- The official evaluation suite and dynamic data release for MixEval.☆233Updated 2 months ago
- Scaling Data-Constrained Language Models☆330Updated 4 months ago
- Language models scale reliably with over-training and on downstream tasks☆96Updated 9 months ago
- ☆75Updated 6 months ago
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆119Updated last month