huggingface / chug
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
☆151Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for chug
- M4 experiment logbook☆56Updated last year
- Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M d…☆186Updated 2 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆167Updated 3 months ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆237Updated 3 months ago
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆168Updated last week
- Multimodal language model benchmark, featuring challenging examples☆148Updated 2 months ago
- LL3M: Large Language and Multi-Modal Model in Jax☆64Updated 6 months ago
- ☆64Updated last year
- ☆86Updated 9 months ago
- ☆57Updated 7 months ago
- InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)☆137Updated 5 months ago
- Generalised Contrastive Learning. This is a Repository for Google Shopping Dataset and Benchmarks followed by our novel fine-grained cont…☆45Updated this week
- Code for Zero-Shot Tokenizer Transfer☆115Updated 2 weeks ago
- Language models scale reliably with over-training and on downstream tasks☆94Updated 7 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆83Updated last week
- Scaling Data-Constrained Language Models☆321Updated last month
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind☆168Updated last month
- ☆292Updated 4 months ago
- Index of URLs to pdf files all over the internet and scripts☆21Updated last year