Jaykef / min-patchnizer
Minimal, clean code for video/image "patchnization" - a process commonly used in tokenizing visual data for use in a Transformer encoder.
☆11Updated 11 months ago
Alternatives and similar repositories for min-patchnizer:
Users that are interested in min-patchnizer are comparing it to the libraries listed below
- Visual RAG using less than 300 lines of code.☆27Updated last year
- ☆12Updated last month
- implementation of https://arxiv.org/pdf/2312.09299☆20Updated 9 months ago
- A library for simplifying fine tuning with multi gpu setups in the Huggingface ecosystem.☆16Updated 5 months ago
- Rust bindings for CTranslate2☆14Updated last year
- ☆19Updated 2 months ago
- Latent Large Language Models☆17Updated 7 months ago
- ☆13Updated last year
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Updated 10 months ago
- ☆21Updated last month
- Example of finetuning CLIP to identify plants.☆11Updated 9 months ago
- Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta☆16Updated 5 months ago
- ☆28Updated last year
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated last week
- Run Vision LLMs, TTS and STT APIs. Website and API for https://text-generator.io☆35Updated this week
- [EMNLP 2024] Tree of Problems: Improving structured problem solving with compositionality☆19Updated last month
- Training hybrid models for dummies.☆20Updated 3 months ago
- ☆21Updated last week
- ☆16Updated 3 weeks ago
- Cog wrapper for collabora/WhisperSpeech☆24Updated last year
- ☆12Updated last year
- Let's try and finetune the OpenAI consistency decoder to work for SDXL☆24Updated last year
- ☆16Updated last year
- A simple package for leveraging Falcon 180B and the HF ecosystem's tools, including training/inference scripts, safetensors, integrations…☆13Updated last year
- ☆13Updated last year
- A Data Source for Reasoning Embodied Agents☆19Updated last year
- ☆11Updated 2 years ago
- Digital daydreaming with CLIP Interrogator and Diffusers☆13Updated 7 months ago
- The open source implementation of "NeVA: NeMo Vision and Language Assistant"☆18Updated last year
- Gradio app to track objects in video and add visual effects☆16Updated 7 months ago