shreydan / VisionGPT2Links
Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.
☆44Updated 2 years ago
Alternatives and similar repositories for VisionGPT2
Users that are interested in VisionGPT2 are comparing it to the libraries listed below
Sorting:
- From scratch implementation of a vision language model in pure PyTorch☆246Updated last year
 - Video+code lecture on building nanoGPT from scratch☆68Updated last year
 - Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆231Updated last year
 - Collection of autoregressive model implementation☆86Updated 6 months ago
 - (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆82Updated 2 months ago
 - ☆136Updated last year
 - AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆102Updated 10 months ago
 - Cerule - A Tiny Mighty Vision Model☆67Updated last year
 - Embed arbitrary modalities (images, audio, documents, etc) into large language models.☆186Updated last year
 - ☆63Updated last year
 - The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆41Updated last year
 - Maybe the new state of the art vision model? we'll see 🤷♂️☆165Updated last year
 - Implementation of DoRA☆304Updated last year
 - This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"☆113Updated 4 months ago
 - This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.☆74Updated 3 months ago
 - A compact LLM pretrained in 9 days by using high quality data☆332Updated 6 months ago
 - Set of scripts to finetune LLMs☆38Updated last year
 - An unofficial pytorch implementation of 'Efficient Infinite Context Transformers with Infini-attention'☆53Updated last year
 - Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.☆58Updated last year
 - Tokun to can tokens☆18Updated 4 months ago
 - Notebook and Scripts that showcase running quantized diffusion models on consumer GPUs☆38Updated last year
 - nanogpt turned into a chat model☆76Updated 2 years ago
 - A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆59Updated last year
 - LoRA and DoRA from Scratch Implementations☆211Updated last year
 - ☆119Updated last year
 - working implimention of deepseek MLA☆44Updated 9 months ago
 - Implementation of the Mamba SSM with hf_integration.☆56Updated last year
 - Tune MPTs☆84Updated 2 years ago
 - ☆124Updated last year
 - Train your own small bitnet model☆75Updated last year