Aleph-Alpha / magma
MAGMA - a GPT-style multimodal model that can understand any combination of images and language. NOTE: The freely available model from this repo is only a demo. For the latest multimodal and multilingual models from Aleph Alpha check out our website https://app.aleph-alpha.com
☆475Updated last year
Related projects: ⓘ
- Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways☆819Updated last year
- Code release for "Dropout Reduces Underfitting"☆311Updated last year
- ☆346Updated 2 years ago
- Implementation of Parti, Google's pure attention-based text-to-image neural network, in Pytorch☆521Updated 9 months ago
- Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch☆1,193Updated last year
- Ask Me Anything language model prompting☆536Updated last year
- Used for adaptive human in the loop evaluation of language and embedding models.☆300Updated last year
- Research code for pixel-based encoders of language (PIXEL)☆329Updated 6 months ago
- Language Modeling with the H3 State Space Model☆509Updated 11 months ago
- Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...☆303Updated 9 months ago
- Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)☆456Updated last year
- Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training☆163Updated last year
- Code release for SLIP Self-supervision meets Language-Image Pre-training☆743Updated last year
- A concise but complete implementation of CLIP with various experimental improvements from recent papers☆678Updated 11 months ago
- DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.☆163Updated 4 months ago
- Code release for "Git Re-Basin: Merging Models modulo Permutation Symmetries"☆463Updated last year
- Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch☆850Updated 10 months ago
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.☆894Updated 3 months ago
- Guide: Finetune GPT2-XL (1.5 Billion Parameters) and finetune GPT-NEO (2.7 B) on a single GPU with Huggingface Transformers using DeepSpe…☆428Updated last year
- ☆495Updated 7 months ago
- Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"☆424Updated last year
- 🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".☆473Updated 10 months ago
- A suite of tools for managing crowdsourcing tasks from the inception through to data packaging for research use.☆303Updated this week
- ☆591Updated last year
- Open reproduction of MUSE for fast text2image generation.☆321Updated 3 months ago
- CLIP (Contrastive Language–Image Pre-training) for Italian☆179Updated last year
- Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.☆357Updated last year
- Aim for the moon. If you miss, you may hit a star.☆157Updated last year
- Cramming the training of a (BERT-type) language model into limited compute.☆1,284Updated 3 months ago
- Language Models Can See: Plugging Visual Controls in Text Generation☆251Updated 2 years ago