shreydan / VisionGPT2Links
Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.
☆43Updated last year
Alternatives and similar repositories for VisionGPT2
Users that are interested in VisionGPT2 are comparing it to the libraries listed below
Sorting:
- From scratch implementation of a vision language model in pure PyTorch☆227Updated last year
- Video+code lecture on building nanoGPT from scratch☆69Updated last year
- Cerule - A Tiny Mighty Vision Model☆66Updated 10 months ago
- ☆134Updated 10 months ago
- Collection of autoregressive model implementation☆85Updated 2 months ago
- Maybe the new state of the art vision model? we'll see 🤷♂️☆165Updated last year
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆41Updated last year
- nanogpt turned into a chat model☆69Updated last year
- Embed arbitrary modalities (images, audio, documents, etc) into large language models.☆184Updated last year
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆55Updated last year
- This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"☆108Updated 2 weeks ago
- An unofficial pytorch implementation of 'Efficient Infinite Context Transformers with Infini-attention'☆52Updated 10 months ago
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…☆81Updated last year
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆232Updated 8 months ago
- Implementation of the Llama architecture with RLHF + Q-learning☆165Updated 5 months ago
- ☆48Updated last week
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆84Updated 4 months ago
- minimal GRPO implementation from scratch☆92Updated 4 months ago
- LoRA and DoRA from Scratch Implementations☆206Updated last year
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆100Updated 6 months ago
- Fine tune Gemma 3 on an object detection task☆69Updated this week
- Google TPU optimizations for transformers models☆114Updated 5 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆198Updated 11 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆101Updated 4 months ago
- ☆40Updated last month
- Set of scripts to finetune LLMs☆37Updated last year
- Notebook and Scripts that showcase running quantized diffusion models on consumer GPUs☆38Updated 8 months ago
- ☆118Updated 10 months ago
- LLM-Training-API: Including Embeddings & ReRankers, mergekit, LaserRMT☆27Updated last year
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆158Updated 3 months ago