Jaykef / min-patchnizer
Minimal, clean code for video/image "patchnization" - a process commonly used in tokenizing visual data for use in a Transformer encoder.
☆11Updated 9 months ago
Alternatives and similar repositories for min-patchnizer:
Users that are interested in min-patchnizer are comparing it to the libraries listed below
- Visual RAG using less than 300 lines of code.☆25Updated 11 months ago
- implementation of https://arxiv.org/pdf/2312.09299☆20Updated 7 months ago
- ☆12Updated 10 months ago
- I have created a dataset of Image-Text-Pairs by using the cosine similarity of the CLIP embeddings of the image & it's caption derrived f…☆15Updated 3 years ago
- Implementation of SoundtStream from the paper: "SoundStream: An End-to-End Neural Audio Codec"☆12Updated 3 weeks ago
- BH hackathon☆14Updated 10 months ago
- Let's try and finetune the OpenAI consistency decoder to work for SDXL☆23Updated last year
- Latent Large Language Models☆17Updated 5 months ago
- Guide diffusion on ImageBind embedding similarity☆28Updated last year
- ☆9Updated last year
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆21Updated 6 months ago
- Rust bindings for CTranslate2☆14Updated last year
- Training hybrid models for dummies.☆20Updated last month
- ☆19Updated 3 months ago
- A fast approach for translating a series of text prompts into a video. The 2022 NeurIPS Workshop on Machine Learning for Creativity and D…☆32Updated last year
- Create topological graph for image segments.☆20Updated 4 months ago
- Describe the format of image/text datasets☆11Updated 2 years ago
- ☆19Updated last year
- ☆28Updated last year
- GET3D online data renderer☆11Updated 2 years ago
- ☆21Updated 2 months ago
- GPT as Knowledger Worker (or if you really want, GPT Sorta' Takes the CPA Exam)☆12Updated 2 years ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- A library for simplifying fine tuning with multi gpu setups in the Huggingface ecosystem.☆16Updated 3 months ago
- Tools for merging pretrained large language models.☆19Updated 8 months ago
- DiCE: The Infinitely Differentiable Monte-Carlo Estimator☆31Updated last year
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated last week
- ☆24Updated last year
- Make-A-Video Latent Diffusion Model☆18Updated last year