A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
☆98Dec 17, 2024Updated last year
Alternatives and similar repositories for Mini-LLaVA
Users that are interested in Mini-LLaVA are comparing it to the libraries listed below
Sorting:
- Pruned CoTracker architecture for tracking the myocardium in 2D echo images.☆19May 6, 2025Updated 10 months ago
- Un-*** 50 billions multimodality dataset☆23Sep 14, 2022Updated 3 years ago
- A tiny, didactical implementation of LLAMA 3☆42Dec 2, 2024Updated last year
- A Prompt Enhancer for flux.1 in ComfyUI☆12Jan 11, 2026Updated last month
- A light-weight and high-efficient training framework for accelerating diffusion tasks.☆51Sep 14, 2024Updated last year
- All-in-one repository for Fine-tuning & Pretraining (Large) Language Models☆15Mar 8, 2023Updated 3 years ago
- A block pruning framework for LLMs.☆28May 17, 2025Updated 9 months ago
- GRadient-INformed MoE☆263Sep 25, 2024Updated last year
- flow-merge is a powerful Python library that enables seamless merging of multiple transformer-based language models using the most popula…☆20Feb 12, 2025Updated last year
- Run SOTA Vision-Language Model Florence-2 on your data!☆15Mar 27, 2025Updated 11 months ago
- This project provides a production-ready, real-time inference server for LatentSync, enabling high-quality, low-latency 2D digital human …☆21Aug 16, 2025Updated 6 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Sep 25, 2024Updated last year
- ☆33Nov 4, 2024Updated last year
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆132Nov 19, 2024Updated last year
- Measuring the Signal to Noise Ratio in Language Model Evaluation☆28Aug 19, 2025Updated 6 months ago
- A bot that scrapes your jobs in real time, sort them according to preferences and runs an alert☆16Nov 14, 2024Updated last year
- ☆18Jul 24, 2023Updated 2 years ago
- The ESMStereo models are designed with low computational complexity to achieve an acceptable balance between accuracy and speed, which ma…☆57Aug 31, 2025Updated 6 months ago
- ☆68Jun 20, 2024Updated last year
- SMT-LIB benchmarks for shape computations from deep learning models in PyTorch☆18Dec 21, 2022Updated 3 years ago
- Simple notebooks to learn diffusion models on toy datasets☆17Feb 9, 2023Updated 3 years ago
- Effort to open-source 10.5 trillion parameter Gemini model.☆17Dec 6, 2023Updated 2 years ago
- Better WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)☆23Oct 29, 2024Updated last year
- Computing the greatest common divisor with transformers, source code for the paper https//arxiv.org/abs/2308.15594☆14Aug 11, 2025Updated 6 months ago
- Handwritten Number Recognition using CNN and Character Segmentation☆18Apr 20, 2018Updated 7 years ago
- Pretraining and finetuning for visual instruction following with Mixture of Experts☆16Jan 30, 2024Updated 2 years ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆35Mar 19, 2024Updated last year
- Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think!☆123Mar 4, 2025Updated last year
- ☆15Apr 28, 2023Updated 2 years ago
- ☆16Mar 22, 2024Updated last year
- MSPaint for marimo and other Python notebooks☆24Oct 24, 2025Updated 4 months ago
- list of papers, code, datasets and other resources☆14Jul 22, 2022Updated 3 years ago
- TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…☆25Jun 4, 2025Updated 9 months ago
- Solution of Kaggle competition: Feedback Prize - Evaluating Student Writing☆16Mar 30, 2022Updated 3 years ago
- A collection of open-source algorithms for chest X-ray analysis☆20Oct 10, 2023Updated 2 years ago
- Code release of paper "ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning" (NeurIPS 2023)☆17Dec 30, 2023Updated 2 years ago
- Implementation of ''VPUFormer: Visual Prompt Unified Transformer for Interactive Image Segmentation''☆15Sep 16, 2025Updated 5 months ago
- [ICCV2025] Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark☆177Oct 15, 2025Updated 4 months ago
- Unofficial Implementation of Evolutionary Model Merging☆41Mar 28, 2024Updated last year