ariG23498 / mmdpLinks
☆29Updated 4 months ago
Alternatives and similar repositories for mmdp
Users that are interested in mmdp are comparing it to the libraries listed below
Sorting:
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆96Updated 10 months ago
- Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"☆61Updated 11 months ago
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.☆159Updated last year
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆21Updated last year
- ☆59Updated last year
- Timm model explorer☆42Updated last year
- MatFormer repo☆65Updated 11 months ago
- Load any clip model with a standardized interface☆21Updated 3 weeks ago
- Experimental CUDA kernel framework unifying typed dimensions, NVRTC JIT specialization, and ML‑guided tuning.☆43Updated last week
- Easily run PyTorch on multiple GPUs & machines☆47Updated 2 weeks ago
- A dashboard for exploring timm learning rate schedulers☆19Updated 11 months ago
- Notebook and Scripts that showcase running quantized diffusion models on consumer GPUs☆38Updated last year
- [NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation, arXiv 2024☆64Updated 3 weeks ago
- ☆78Updated 2 months ago
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆82Updated 3 months ago
- ☆186Updated last year
- Official PyTorch Implementation for Paper "No More Adam: Learning Rate Scaling at Initialization is All You Need"☆54Updated 9 months ago
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆101Updated last year
- Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch☆102Updated 2 years ago
- Notebooks to demonstrate TimmWrapper☆16Updated 9 months ago
- ☆171Updated 3 months ago
- DPO, but faster 🚀☆46Updated 11 months ago
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆219Updated 3 weeks ago
- Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch☆89Updated last year
- Visualize multi-model embedding spaces. The first goal is to quickly get a lay of the land of any embedding space. Then be able to scroll…☆28Updated last year
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆101Updated last year
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆78Updated 2 years ago
- Implementation of Infini-Transformer in Pytorch☆113Updated 10 months ago
- Notebooks for fine tuning pali gemma☆117Updated 7 months ago
- Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models" ICLR 2024☆108Updated last year