shreydan / VisionGPT2Links
Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.
☆43Updated last year
Alternatives and similar repositories for VisionGPT2
Users that are interested in VisionGPT2 are comparing it to the libraries listed below
Sorting:
- From scratch implementation of a vision language model in pure PyTorch☆239Updated last year
- Video+code lecture on building nanoGPT from scratch☆69Updated last year
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆59Updated last year
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆83Updated last month
- Collection of autoregressive model implementation☆86Updated 4 months ago
- Cerule - A Tiny Mighty Vision Model☆68Updated last year
- ☆50Updated last month
- LoRA and DoRA from Scratch Implementations☆212Updated last year
- Implementation of DoRA☆301Updated last year
- Embed arbitrary modalities (images, audio, documents, etc) into large language models.☆187Updated last year
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆103Updated 8 months ago
- This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.☆73Updated 2 months ago
- ☆134Updated last year
- ☆119Updated last year
- LLM-Training-API: Including Embeddings & ReRankers, mergekit, LaserRMT☆27Updated last year
- Experimenting with small language models☆71Updated last year
- ☆63Updated 11 months ago
- This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"☆112Updated 2 months ago
- Maybe the new state of the art vision model? we'll see 🤷♂️☆166Updated last year
- a simplified version of Meta's Llama 3 model to be used for learning☆42Updated last year
- minimal GRPO implementation from scratch☆97Updated 6 months ago
- Google TPU optimizations for transformers models☆120Updated 8 months ago
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆232Updated 10 months ago
- Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…☆158Updated last year
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆41Updated last year
- working implimention of deepseek MLA☆44Updated 8 months ago
- Easy to use, High Performant Knowledge Distillation for LLMs☆93Updated 4 months ago
- An unofficial pytorch implementation of 'Efficient Infinite Context Transformers with Infini-attention'☆53Updated last year
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆201Updated last year
- Train your own small bitnet model☆75Updated 11 months ago