shan18 / Perceiver-Resampler-XAttn-CaptioningLinks

Generating Captions via Perceiver-Resampler Cross-Attention Networks

☆17

Alternatives and similar repositories for Perceiver-Resampler-XAttn-Captioning

Users that are interested in Perceiver-Resampler-XAttn-Captioning are comparing it to the libraries listed below

Sorting:

huggingface / m4-logs
M4 experiment logbook
☆57Updated 2 years ago
ryanwebster90 / snip-dedup
☆103Updated last year
huggingface / chug
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
☆159Updated last year
facebookresearch / clip-rocket
Code release for "Improved baselines for vision-language pre-training"
☆61Updated last year
SriramB-98 / vit-decompose
☆23Updated 9 months ago
eth-easl / fmengine
Utilities for Training Very Large Models
☆58Updated last year
facebookresearch / capi
Code and weights for the paper "Cluster and Predict Latents Patches for Improved Masked Image Modeling"
☆122Updated 6 months ago
apoorvkh / torchrunx
Easily run PyTorch on multiple GPUs & machines
☆47Updated 2 weeks ago
lucidrains / flash-cosine-sim-attention
Implementation of fused cosine similarity attention in the same style as Flash Attention
☆217Updated 2 years ago
OATML / RHO-Loss
☆210Updated 3 years ago
cloneofsimo / zeroshampoo
☆34Updated last year
lucidrains / einops-exts
Implementation of some personal helper functions for Einops, my most favorite tensor manipulation library ❤️
☆55Updated 2 years ago
LAION-AI / General-GPT
☆65Updated 2 years ago
jxiw / BiGS
Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE …
☆114Updated last year
patil-suraj / vit-vqgan
JAX implementation ViT-VQGAN
☆82Updated 3 years ago
bfshi / TOAST
Official code for "TOAST: Transfer Learning via Attention Steering"
☆186Updated 2 years ago
TomerRonen34 / mixed-resolution-vit
☆53Updated 2 years ago
LAION-AI / laion50BU
Un-*** 50 billions multimodality dataset
☆22Updated 3 years ago
jiasenlu / LL3M
LL3M: Large Language and Multi-Modal Model in Jax
☆74Updated last year
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
taesiri / ZoomIsAllYouNeed
Official code and data for NeurIPS 2023 paper "ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial …
☆39Updated last year
EleutherAI / pilev2
☆13Updated 2 years ago
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆99Updated last year
sayakpaul / big_vision_experiments
Contains my experiments with the `big_vision` repo to train ViTs on ImageNet-1k.
☆22Updated 2 years ago
JeanKaddour / NoTrainNoGain
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆80Updated 2 years ago
facebookresearch / adaptive_scheduling
Experimental scripts for researching data adaptive learning rate scheduling.
☆22Updated 2 years ago
cloneofsimo / min-fsdp
☆91Updated last year
andravin / spio
Memory-Efficient CUDA kernels for training ConvNets with PyTorch.
☆42Updated 8 months ago
lucidrains / MaMMUT-pytorch
Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch
☆102Updated 2 years ago
mlfoundations / scaling
Language models scale reliably with over-training and on downstream tasks
☆100Updated last year