Beckschen/LLaVolta

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Beckschen/LLaVolta)

Beckschen / LLaVolta

[NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression

☆66

Alternatives and similar repositories for LLaVolta

Users that are interested in LLaVolta are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TACJu / Compositor
View on GitHub
This repo contains the code for our paper Compositor: Bottom-Up Clustering and Compositing for Robust Part and Object Segmentation
☆18Mar 20, 2025Updated last year
DingchenYang99 / Pensieve
View on GitHub
The official repo of our work "Pensieve: Retrospect-then-Compare mitigates Visual Hallucination"
☆15May 4, 2024Updated 2 years ago
pkunlp-icler / FastV
View on GitHub
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…
☆592Jan 4, 2025Updated last year
locuslab / llava-token-compression
View on GitHub
☆47Nov 8, 2024Updated last year
lzhxmu / VTW
View on GitHub
Code release for VTW (AAAI 2025 Oral)
☆68Nov 4, 2025Updated 8 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Beckschen / ViTamin
View on GitHub
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
☆211Jun 9, 2024Updated 2 years ago
Yaser-wyx / SCANet
View on GitHub
init
☆12May 25, 2025Updated last year
xuboshen / EgoNCEpp
View on GitHub
[ICLR'25] Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions?
☆13Apr 11, 2025Updated last year
LaVi-Lab / Visual-Table
View on GitHub
[EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"
☆20Oct 17, 2024Updated last year
chenllliang / MMEvalPro
View on GitHub
[NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs
☆25Sep 26, 2024Updated last year
chenllliang / DnD-Transformer
View on GitHub
[ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…
☆80Dec 10, 2024Updated last year
Theia-4869 / FasterVLM
View on GitHub
Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.
☆114Jun 29, 2025Updated last year
lzhxmu / AccDiffusion_v2
View on GitHub
Code release for AccDiffusionV2 (TPAMI)
☆34Nov 4, 2025Updated 8 months ago
whwu95 / FreeVA
View on GitHub
FreeVA: Offline MLLM as Training-Free Video Assistant
☆69Jun 9, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
d-ailin / CLIP-Guided-Decoding
View on GitHub
☆18Aug 1, 2024Updated last year
formll / resolving-scaling-law-discrepancies
View on GitHub
☆19Nov 4, 2025Updated 8 months ago
JiuTian-VL / MoME
View on GitHub
[NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
☆85Dec 27, 2025Updated 6 months ago
caiyuanhao1998 / Open-PhyGDPO
View on GitHub
PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation (ECCV 2026)
☆67Jun 20, 2026Updated last month
MrZilinXiao / AutoVER
View on GitHub
[ECCV'24] Official Implementation of Autoregressive Visual Entity Recognizer.
☆14Mar 2, 2024Updated 2 years ago
UCSC-VLAA / CLIPS
View on GitHub
An Enhanced CLIP Framework for Learning with Synthetic Captions
☆40Apr 18, 2025Updated last year
pliang279 / HEMM
View on GitHub
Holistic evaluation of multimodal foundation models
☆48Aug 11, 2024Updated last year
GuoTianYu2000 / Active-Dormant-Attention
View on GitHub
codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"
☆11Dec 30, 2024Updated last year
WangWenhao0716 / PDF-Embedding
View on GitHub
[NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"
☆18Oct 1, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ljang0 / videowebarena
View on GitHub
☆14Dec 25, 2024Updated last year
Cooperx521 / PyramidDrop
View on GitHub
(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
☆151Mar 6, 2025Updated last year
SHI-Labs / CuMo
View on GitHub
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
☆163Jun 8, 2024Updated 2 years ago
ncsu-dk-lab / Acc-DD
View on GitHub
☆14Apr 21, 2023Updated 3 years ago
yuz1wan / video_distillation
View on GitHub
Official implementation of Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement.
☆32Dec 21, 2025Updated 7 months ago
CircleRadon / TokenPacker
View on GitHub
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025
☆279May 26, 2025Updated last year
ngocbh / trimkv
View on GitHub
[TrimKV] Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs - [DBTrimKV] Make Each Token Count: Towards Improving Lo…
☆15May 13, 2026Updated 2 months ago
showlab / videollm-online
View on GitHub
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
☆676Nov 26, 2025Updated 7 months ago
Tencent-QQMM / Video-CCAM
View on GitHub
A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.
☆74Oct 14, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Beckschen / spatialcode
View on GitHub
Open studio for "Thinking with Spatial Code" (https://arxiv.org/pdf/2603.05591)
☆20Mar 18, 2026Updated 4 months ago
ByungKwanLee / Phantom
View on GitHub
[Technical Report] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with …
☆63Oct 9, 2024Updated last year
HaozheZhao / MIC_tool
View on GitHub
☆14Nov 14, 2023Updated 2 years ago
rsshyam / GRPO-bandits
View on GitHub
☆13Sep 12, 2024Updated last year
Share14 / ShareGemini
View on GitHub
☆32Jul 29, 2024Updated last year
FuxiaoLiu / LRV-Instruction
View on GitHub
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
☆297Mar 13, 2024Updated 2 years ago
hasanar1f / HiRED
View on GitHub
[AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…
☆58Apr 18, 2025Updated last year