hkproj/pytorch-paligemma

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/hkproj/pytorch-paligemma)

hkproj / pytorch-paligemma

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation: https://www.youtube.com/watch?v=vAmKB7iPkWw

☆625

Alternatives and similar repositories for pytorch-paligemma

Users that are interested in pytorch-paligemma are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hkproj / pytorch-llama
View on GitHub
LLaMA 2 implemented from scratch in PyTorch
☆375Sep 25, 2023Updated 2 years ago
hkproj / pytorch-transformer
View on GitHub
Attention is all you need implementation
☆1,255Jun 8, 2024Updated 2 years ago
hkproj / triton-flash-attention
View on GitHub
☆257Jan 2, 2025Updated last year
hkproj / pytorch-stable-diffusion
View on GitHub
Stable Diffusion implemented from scratch in PyTorch
☆1,073Oct 22, 2024Updated last year
hkproj / rlhf-ppo
View on GitHub
Notes and commented code for RLHF (PPO)
☆136Feb 27, 2024Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
lucidrains / pi-zero-pytorch
View on GitHub
Implementation of π₀, the robotic foundation model architecture proposed by Physical Intelligence
☆581Jan 31, 2026Updated 5 months ago
hkproj / pytorch-transformer-distributed
View on GitHub
Distributed training (multi-node) of a Transformer model
☆98Apr 10, 2024Updated 2 years ago
hkproj / pytorch-llama-notes
View on GitHub
Notes about LLaMA 2 model
☆75Aug 30, 2023Updated 2 years ago
kadirnar / MeloPlus
View on GitHub
MeloPlus: Advanced Python Library for MeloTts
☆12Dec 1, 2025Updated 7 months ago
huggingface / nanoVLM
View on GitHub
The simplest, fastest repository for training/finetuning small-sized VLMs.
☆4,972Oct 27, 2025Updated 9 months ago
AviSoori1x / seemore
View on GitHub
From scratch implementation of a vision language model in pure PyTorch
☆260May 6, 2024Updated 2 years ago
Full-Stack-Data-Science / real-time-ml-inference-with-spark-streaming-and-kafka
View on GitHub
FSDS Webinar 1: Real-Time Machine Learning Inference with Spark Streaming and Kafka
☆10Feb 17, 2025Updated last year
naklecha / llama3-from-scratch
View on GitHub
llama3 implementation one matrix multiplication at a time
☆15,224May 23, 2024Updated 2 years ago
hkproj / multi-latent-attention
View on GitHub
☆46May 24, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
flow-diffusion / AVDC
View on GitHub
Official repository of Learning to Act from Actionless Videos through Dense Correspondences.
☆262Apr 25, 2024Updated 2 years ago
kmohan321 / Research_Papers
View on GitHub
☆45Mar 31, 2025Updated last year
hscspring / llama.np
View on GitHub
Inference Llama/Llama2/Llama3 Modes in NumPy
☆21Nov 22, 2023Updated 2 years ago
QwenLM / Qwen3-VL
View on GitHub
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
☆19,690Jan 30, 2026Updated 5 months ago
lucidrains / transfusion-pytorch
View on GitHub
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
☆1,385Jan 27, 2026Updated 6 months ago
google-research / big_vision
View on GitHub
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
☆3,502May 19, 2025Updated last year
google-research / language-table
View on GitHub
Suite of human-collected datasets and a multi-task continuous control benchmark for open vocabulary visuolinguomotor learning.
☆363Jul 2, 2026Updated 3 weeks ago
gpu-mode / lectures
View on GitHub
Material for gpu-mode lectures
☆6,379Jun 15, 2026Updated last month
hkproj / pytorch-lora
View on GitHub
LORA: Low-Rank Adaptation of Large Language Models implemented using PyTorch
☆128Jul 24, 2023Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
HeegerGao / VLA-OS
View on GitHub
Official Code For VLA-OS.
☆145Jun 25, 2025Updated last year
cheryyunl / Make-An-Agent
View on GitHub
☆51Jul 22, 2024Updated 2 years ago
hkproj / dpo-notes
View on GitHub
Notes on Direct Preference Optimization
☆28Apr 14, 2024Updated 2 years ago
allenzren / open-pi-zero
View on GitHub
Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence
☆1,507Jan 31, 2025Updated last year
StoreBlank / KUDA
View on GitHub
KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation
☆22Apr 23, 2025Updated last year
emirhanbilgic / Turkish-TTS
View on GitHub
This repository contains the training codes of the fine-tuned SpeechT5 on a Turkish dataset.
☆20Sep 4, 2024Updated last year
karpathy / build-nanogpt
View on GitHub
Video+code lecture on building nanoGPT from scratch
☆5,395Aug 13, 2024Updated last year
merveenoyan / smol-vision
View on GitHub
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
☆1,966May 26, 2026Updated 2 months ago
shreydan / VisionGPT2
View on GitHub
Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.
☆49Oct 2, 2023Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
kesenzhao / UV-CoT
View on GitHub
☆45Jul 28, 2025Updated last year
huggingface / picotron
View on GitHub
Minimalistic 4D-parallelism distributed training framework for education purpose
☆2,260Aug 26, 2025Updated 11 months ago
BradyFU / Awesome-Multimodal-Large-Language-Models
View on GitHub
Latest Advances on Multimodal Large Language Models
☆17,959Updated this week
ShareChatAI / MACD
View on GitHub
☆19Feb 22, 2024Updated 2 years ago
rasbt / LLMs-from-scratch
View on GitHub
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
☆99,963Updated this week
hkproj / transformer-from-scratch-notes
View on GitHub
Notes about "Attention is all you need" video (https://www.youtube.com/watch?v=bCz4OMemCcA)
☆371May 28, 2023Updated 3 years ago
matsuolab / virtual_desktop_docker
View on GitHub
A minimal toolset for running UI applications within docker isolated X11 environment
☆16Jan 11, 2026Updated 6 months ago