albanie / foundation-models
Video descriptions of research papers relating to foundation models and scaling
☆30Updated last year
Related projects ⓘ
Alternatives and complementary repositories for foundation-models
- ☆30Updated this week
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆96Updated 2 months ago
- ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)☆14Updated 10 months ago
- ☆19Updated last month
- Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch☆97Updated last year
- ☆30Updated 9 months ago
- This is a PyTorch implementation of the paperViP A Differentially Private Foundation Model for Computer Vision☆37Updated last year
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆30Updated 4 months ago
- ☆40Updated this week
- ☆20Updated last month
- Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models".☆94Updated 5 months ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆108Updated last month
- ☆64Updated last year
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆49Updated 2 months ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆66Updated last year
- FuseCap: Large Language Model for Visual Data Fusion in Enriched Caption Generation☆49Updated 7 months ago
- ☆26Updated 2 months ago
- Official implementation of the paper The Hidden Language of Diffusion Models☆69Updated 9 months ago
- Object Recognition as Next Token Prediction (CVPR 2024 Highlight)☆161Updated last month
- [ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.☆27Updated last year
- Implementation of Zorro, Masked Multimodal Transformer, in Pytorch☆95Updated last year
- Official repository for the General Robust Image Task (GRIT) Benchmark☆50Updated last year
- Holistic evaluation of multimodal foundation models☆41Updated 3 months ago
- Code release for "Improved baselines for vision-language pre-training"☆57Updated 6 months ago
- An official PyTorch implementation for CLIPPR☆28Updated last year
- Un-*** 50 billions multimodality dataset☆24Updated 2 years ago
- ☆24Updated last year
- JAX implementation ViT-VQGAN☆77Updated 2 years ago
- Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind☆53Updated 2 months ago
- codebase for the SIMAT dataset and evaluation☆38Updated 2 years ago