Jaykef / min-patchnizer
Minimal, clean code for video/image "patchnization" - a process commonly used in tokenizing visual data for use in a Transformer encoder.
☆11Updated 8 months ago
Alternatives and similar repositories for min-patchnizer:
Users that are interested in min-patchnizer are comparing it to the libraries listed below
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated this week
- ☆18Updated last month
- A library for simplifying fine tuning with multi gpu setups in the Huggingface ecosystem.☆16Updated 2 months ago
- Cog wrapper for collabora/WhisperSpeech☆25Updated 10 months ago
- A Data Source for Reasoning Embodied Agents☆19Updated last year
- Training hybrid models for dummies.☆16Updated this week
- Github repo for Peifeng's internship project☆12Updated last year
- Latent Large Language Models☆17Updated 4 months ago
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆15Updated 2 months ago
- ☆28Updated last year
- ☆15Updated last year
- ☆12Updated 9 months ago
- Visual RAG using less than 300 lines of code.☆24Updated 10 months ago
- Tools for merging pretrained large language models.☆19Updated 7 months ago
- Load any clip model with a standardized interface☆21Updated 8 months ago
- Rust bindings for CTranslate2☆14Updated last year
- MPI Code Generation through Domain-Specific Language Models☆13Updated 2 months ago
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Updated 7 months ago
- A list of language models with permissive licenses such as MIT or Apache 2.0☆24Updated 2 months ago
- ☆26Updated 10 months ago
- Fast approximate inference on a single GPU with sparsity aware offloading☆38Updated last year
- Official repository for the paper "Images as Weight Matrices: Sequential Image Generation Through Synaptic Learning Rules" (ICLR 2023)☆12Updated last year
- Multimodal Open Source Framework for Conversational Agent Research and Development.☆15Updated 2 months ago
- implementation of https://arxiv.org/pdf/2312.09299☆20Updated 6 months ago
- Multi-Modal Language Modeling with Image, Audio and Text Integration, included multi-images and multi-audio in a single multiturn.☆16Updated 10 months ago
- Implementation of SoundtStream from the paper: "SoundStream: An End-to-End Neural Audio Codec"☆12Updated 2 months ago
- Make-A-Video Latent Diffusion Model☆18Updated last year
- Describe the format of image/text datasets☆11Updated 2 years ago