A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
☆98Dec 17, 2024Updated last year
Alternatives and similar repositories for Mini-LLaVA
Users that are interested in Mini-LLaVA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Pruned CoTracker architecture for tracking the myocardium in 2D echo images.☆19May 6, 2025Updated 10 months ago
- Visualize any repo or codebase into diagram or animation☆22Oct 14, 2024Updated last year
- TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…☆25Jun 4, 2025Updated 9 months ago
- ☆12Apr 7, 2024Updated last year
- Repo of HawkLlama.☆16Jan 2, 2025Updated last year
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- [ICCV 2025] Dynamic-VLM☆28Dec 16, 2024Updated last year
- A tiny, didactical implementation of LLAMA 3☆42Dec 2, 2024Updated last year
- ☆11Nov 5, 2024Updated last year
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Sep 25, 2024Updated last year
- Turkish Vision Language Model Development And Research☆16Aug 9, 2024Updated last year
- ☆13May 10, 2025Updated 10 months ago
- Unofficial Implementation of Selective Attention Transformer☆21Oct 31, 2024Updated last year
- A block pruning framework for LLMs.☆28May 17, 2025Updated 10 months ago
- Data and workflow examples☆16Apr 9, 2018Updated 7 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Long Context Research☆31Jan 26, 2026Updated 2 months ago
- A light-weight and high-efficient training framework for accelerating diffusion tasks.☆52Sep 14, 2024Updated last year
- Run SOTA Vision-Language Model Florence-2 on your data!☆15Mar 27, 2025Updated last year
- This repo contains the dataset for paper: Creating a Dataset Supporting Translation Between OpenMP Fortran and C++ Code☆15Dec 1, 2023Updated 2 years ago
- WeGeFT: Weight‑Generative Fine‑Tuning for Multi‑Faceted Efficient Adaptation of Large Models☆22Jul 10, 2025Updated 8 months ago
- Code-Switched translations with Large Language models☆25Dec 17, 2024Updated last year
- ☆33Nov 4, 2024Updated last year
- Accompanying code for "Analyzing Vision Tranformers in Class Embedding Space" (NeurIPS '23)☆15Jun 10, 2024Updated last year
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆213Jan 6, 2025Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think!☆122Mar 4, 2025Updated last year
- minimal GRPO implementation from scratch☆103Mar 14, 2025Updated last year
- A bot that scrapes your jobs in real time, sort them according to preferences and runs an alert☆16Nov 14, 2024Updated last year
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆35Mar 19, 2024Updated 2 years ago
- When do we not need larger vision models?☆416Feb 8, 2025Updated last year
- Official implementation for LaCo (EMNLP 2024 Findings)☆21Oct 3, 2024Updated last year
- ☆11Apr 27, 2013Updated 12 years ago
- Code for ICML 2025 paper | Joint Localization and Activation Editing for Low-Resource Fine-Tuning☆27Jun 18, 2025Updated 9 months ago
- ☆12Jan 17, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- A Framework of Small-scale Large Multimodal Models☆965Mar 22, 2026Updated last week
- Implementation of ''VPUFormer: Visual Prompt Unified Transformer for Interactive Image Segmentation''☆15Sep 16, 2025Updated 6 months ago
- a plugin-oriented framework for video structured. 国产程序员请加微信zhzhi78拉群交流。☆18May 28, 2024Updated last year
- Handwritten digit classification web app using Streamlit☆10Jan 15, 2024Updated 2 years ago
- Unofficial Implementation of Evolutionary Model Merging☆41Mar 28, 2024Updated 2 years ago
- LocalPlexity is a lite version of Perplexity aimed at 100% privacy and openness. Everything is done locally, in your browser, from search…☆21Aug 12, 2024Updated last year
- [ICCV 2025] "Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning".☆20Dec 11, 2025Updated 3 months ago