hu-po / streamdocsLinks
Documentation, notes, links, etc for streams.
β83Updated last year
Alternatives and similar repositories for streamdocs
Users that are interested in streamdocs are comparing it to the libraries listed below
Sorting:
- documentation for content creationβ221Updated last week
- Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration π€β275Updated 7 months ago
- Implementation of a framework for Genie2 in Pytorchβ151Updated 8 months ago
- Unofficial implementation and experiments related to Set-of-Mark (SoM) ποΈβ88Updated last year
- ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editingβ70Updated last year
- Implementation of Lumiere, SOTA text-to-video generation from Google Deepmind, in Pytorchβ278Updated last year
- PyTorch Implementation of Object Recognition as Next Token Prediction [CVPR'24 Highlight]β181Updated 4 months ago
- Just another reasonably minimal repo for class-conditional training of pixel-space diffusion transformers.β123Updated 3 months ago
- Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"β180Updated last year
- β193Updated last year
- NeuMeta transforms neural networks by allowing a single model to adapt on the fly to different sizes, generating the right weights when nβ¦β43Updated 10 months ago
- This repo contains the code for the paper "Intuitive physics understanding emerges fromself-supervised pretraining on natural videos"β183Updated 7 months ago
- Implementation of the premier Text to Video model from OpenAIβ57Updated 10 months ago
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.β96Updated 9 months ago
- This is the repository for the Photorealistic Unreal Graphics (PUG) datasets for representation learning.β237Updated last year
- Official PyTorch implementation of TokenSet.β123Updated 6 months ago
- β85Updated last year
- π€© An AWESOME Curated List of Papers, Workshops, Datasets, and Challenges from CVPR 2024β144Updated last year
- Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmindβ56Updated 3 months ago
- The official repository for HyperZβ Zβ W Operator Connects Slow-Fast Networks for Full Context Interaction.β39Updated 5 months ago
- From scratch implementation of a vision language model in pure PyTorchβ239Updated last year
- Focused on fast experimentation and simplicityβ75Updated 8 months ago
- Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models. TMLR 2025.β105Updated last week
- Timm model explorerβ41Updated last year
- β302Updated 4 months ago
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Modelsβ278Updated last year
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.β67Updated last year
- [ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learningβ378Updated last week
- Implementation of the Llama architecture with RLHF + Q-learningβ166Updated 7 months ago
- β69Updated last year