shan18 / Perceiver-Resampler-XAttn-CaptioningLinks
Generating Captions via Perceiver-Resampler Cross-Attention Networks
☆17Updated 2 years ago
Alternatives and similar repositories for Perceiver-Resampler-XAttn-Captioning
Users that are interested in Perceiver-Resampler-XAttn-Captioning are comparing it to the libraries listed below
Sorting:
- ☆23Updated 7 months ago
- Code release for "Improved baselines for vision-language pre-training"☆60Updated last year
- ☆104Updated last year
- Switch EMA: A Free Lunch for Better Flatness and Sharpness☆26Updated last year
- ☆51Updated last year
- ☆65Updated last year
- Official code and data for NeurIPS 2023 paper "ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial …☆39Updated last year
- ☆208Updated 2 years ago
- These papers will provide unique insightful concepts that will broaden your perspective on neural networks and deep learning☆48Updated 2 years ago
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.☆158Updated last year
- Code and weights for the paper "Cluster and Predict Latents Patches for Improved Masked Image Modeling"☆116Updated 4 months ago
- Un-*** 50 billions multimodality dataset☆23Updated 2 years ago
- ☆53Updated last year
- Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)☆53Updated 5 months ago
- ☆34Updated 11 months ago
- CLOOB training (JAX) and inference (JAX and PyTorch)☆72Updated 3 years ago
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆85Updated last year
- Code for T-MARS data filtering☆35Updated 2 years ago
- ☆15Updated last year
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆77Updated last year
- JAX implementation ViT-VQGAN☆83Updated 2 years ago
- Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"☆178Updated last year
- 🦾 EvalGIM (pronounced as "EvalGym") is an evaluation library for generative image models. It enables easy-to-use, reproducible automatic…☆82Updated 8 months ago
- ☆39Updated last year
- Official PyTorch Implementation for Paper "No More Adam: Learning Rate Scaling at Initialization is All You Need"☆52Updated 7 months ago
- Contains my experiments with the `big_vision` repo to train ViTs on ImageNet-1k.☆22Updated 2 years ago
- Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch☆103Updated last year
- Efficiently read embedding in streaming from any filesystem☆102Updated 3 weeks ago
- WIP☆94Updated last year
- M4 experiment logbook☆58Updated 2 years ago