hu-po / streamdocsLinks
Documentation, notes, links, etc for streams.
β83Updated last year
Alternatives and similar repositories for streamdocs
Users that are interested in streamdocs are comparing it to the libraries listed below
Sorting:
- Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration π€β280Updated 8 months ago
 - documentation for content creationβ227Updated last month
 - Unofficial implementation and experiments related to Set-of-Mark (SoM) ποΈβ87Updated 2 years ago
 - Implementation of a framework for Genie2 in Pytorchβ153Updated 9 months ago
 - Implementation of Lumiere, SOTA text-to-video generation from Google Deepmind, in Pytorchβ280Updated last year
 - [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Modelsβ277Updated last year
 - This is the repository for the Photorealistic Unreal Graphics (PUG) datasets for representation learning.β237Updated last year
 - PyTorch Implementation of Object Recognition as Next Token Prediction [CVPR'24 Highlight]β180Updated 6 months ago
 - A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.β96Updated 10 months ago
 - Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"β181Updated last year
 - Python Library to evaluate VLM models' robustness across diverse benchmarksβ217Updated 2 weeks ago
 - From scratch implementation of a vision language model in pure PyTorchβ246Updated last year
 - β194Updated last year
 - Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmindβ57Updated 5 months ago
 - a family of highly capabale yet efficient large multimodal modelsβ191Updated last year
 - β90Updated last year
 - This repo contains the code for the paper "Intuitive physics understanding emerges fromself-supervised pretraining on natural videos"β191Updated 8 months ago
 - NeuMeta transforms neural networks by allowing a single model to adapt on the fly to different sizes, generating the right weights when nβ¦β43Updated 11 months ago
 - β69Updated last year
 - ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editingβ69Updated last year
 - Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models. TMLR 2025.β118Updated last month
 - Implementation of the Llama architecture with RLHF + Q-learningβ167Updated 9 months ago
 - Simple large-scale training of stable diffusion with multi-node support.β133Updated 2 years ago
 - [ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learningβ400Updated last month
 - Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models" ICLR 2024β106Updated last year
 - Just another reasonably minimal repo for class-conditional training of pixel-space diffusion transformers.β131Updated 5 months ago
 - Implementation of the premier Text to Video model from OpenAIβ54Updated 11 months ago
 - β302Updated 6 months ago
 - Internet Explorer explores the web in a self-supervised manner to progressively find relevant examples that improve performance on a desiβ¦β163Updated 2 years ago
 - Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAIβ291Updated 5 months ago