Jaykef / min-patchnizerLinks
Minimal, clean code for video/image "patchnization" - a process commonly used in tokenizing visual data for use in a Transformer encoder.
☆11Updated last year
Alternatives and similar repositories for min-patchnizer
Users that are interested in min-patchnizer are comparing it to the libraries listed below
Sorting:
- implementation of https://arxiv.org/pdf/2312.09299☆21Updated last year
- Make-A-Video Latent Diffusion Model☆19Updated 2 years ago
- ☆27Updated last year
- GPU accelerated client-side embeddings for vector search, RAG etc.☆65Updated last year
- Cerule - A Tiny Mighty Vision Model☆68Updated 3 weeks ago
- Create topological graph for image segments.☆22Updated last year
- Run Vision LLMs, TTS and STT APIs. Website and API for https://text-generator.io☆38Updated 2 months ago
- [WIP] Transformer to embed Danbooru labelsets☆13Updated last year
- Rust bindings for CTranslate2☆14Updated 2 years ago
- GPT as Knowledger Worker (or if you really want, GPT Sorta' Takes the CPA Exam)☆13Updated 2 years ago
- Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.☆46Updated 2 years ago
- Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine☆31Updated 3 years ago
- Load any clip model with a standardized interface☆22Updated last month
- The implementation of "Leeroo Orchestrator: Elevating LLMs Performance Through Model Integration"☆56Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆19Updated 2 years ago
- ☆63Updated last year
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆38Updated 5 months ago
- The Next Generation Multi-Modality Superintelligence☆70Updated last year
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆23Updated last year
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆89Updated last year
- ☆20Updated 8 months ago
- Implementation of the proposed Spline-Based Transformer from Disney Research☆105Updated last year
- Latent Diffusion Language Models☆70Updated 2 years ago
- assign color hues to a collection of text fragments based on embeddings☆20Updated last year
- ☆19Updated 2 years ago
- A Data Source for Reasoning Embodied Agents☆19Updated 2 years ago
- An EXA-Scale repository of Multi-Modality AI resources from papers and models, to foundational libraries!☆40Updated last year
- Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️☆88Updated 2 years ago
- BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…☆23Updated last year
- Implementation of a holodeck, written in Pytorch☆18Updated 2 years ago