zhourax/VEGA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zhourax/VEGA)

zhourax / VEGA

☆38

Alternatives and similar repositories for VEGA

Users that are interested in VEGA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ggg0919 / cantor
View on GitHub
☆90May 10, 2024Updated 2 years ago
MAC-AutoML / QuoTA
View on GitHub
✨✨[AAAI 2026] This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Vi…
☆79Apr 28, 2025Updated last year
xjtupanda / Sparrow
View on GitHub
Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"
☆48Sep 3, 2025Updated 10 months ago
MME-Benchmarks / MME-Unify
View on GitHub
✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
☆42Apr 10, 2025Updated last year
MME-Benchmarks / Video-MME
View on GitHub
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
☆787Dec 8, 2025Updated 7 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
Share14 / ShareGemini
View on GitHub
☆32Jul 29, 2024Updated last year
MME-Benchmarks / Video-MME-v2
View on GitHub
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
☆369May 24, 2026Updated 2 months ago
THUKElab / Visual-C3
View on GitHub
Towards Real-World Writing Assistance: A Chinese Character Checking Benchmark with Faked and Misspelled Characters
☆17May 30, 2024Updated 2 years ago
VITA-MLLM / Long-VITA
View on GitHub
✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy
☆305May 14, 2025Updated last year
Tencent / VITA
View on GitHub
The official implement of VITA, VITA15, LongVITA, VITA-Audio, VITA-VLA, and VITA-E.
☆162Oct 28, 2025Updated 8 months ago
Kwai-YuanQi / MM-RLHF
View on GitHub
The Next Step Forward in Multimodal LLM Alignment
☆198May 1, 2025Updated last year
Leolty / tablellm
View on GitHub
☆26Dec 29, 2023Updated 2 years ago
VITA-MLLM / Omni-Diffusion
View on GitHub
✨✨[ICML 2026] Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion
☆153Mar 12, 2026Updated 4 months ago
Northern-byte-bit / SpeechParaling-Bench
View on GitHub
☆30May 21, 2026Updated 2 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
Leon1207 / 3DRefTR
View on GitHub
This is a PyTorch implementation of 3DRefTR proposed by our paper "A Unified Framework for 3D Point Cloud Visual Grounding"
☆26Aug 24, 2023Updated 2 years ago
VITA-MLLM / VITA-Audio
View on GitHub
✨✨[NeurIPS 2025] VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
☆682May 24, 2025Updated last year
zehanwang01 / OmniBind
View on GitHub
☆34Apr 11, 2025Updated last year
shenyunhang / APE
View on GitHub
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
☆608May 8, 2024Updated 2 years ago
qhfan / UniPrefill
View on GitHub
Implementation of "UniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsification"
☆41May 8, 2026Updated 2 months ago
deepglint / Victor
View on GitHub
ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs
☆29Aug 15, 2025Updated 11 months ago
ViktorLiang / Spatial-Temporal-Pooling-Networks-ReID
View on GitHub
Pytorch implementation of the Attentive Spatial-Temporal Pooling Networks by Shuangjie Xu(2017.https://arxiv.org/abs/1708.02286)
☆12Jun 10, 2018Updated 8 years ago
IVY-LVLM / CODE
View on GitHub
Official Implementation of CODE
☆17Sep 26, 2024Updated last year
yuecao0119 / MMInstruct
View on GitHub
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆64Nov 7, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
JCruan519 / GIST
View on GitHub
(ACM MM24) This is the offical repository of GIST: Improving Parameter Efficient Fine Tuning via Knowledge Interaction.
☆11Jan 28, 2024Updated 2 years ago
VITA-MLLM / Woodpecker
View on GitHub
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models
☆649Dec 23, 2024Updated last year
NVlabs / FRAG
View on GitHub
☆15Apr 25, 2025Updated last year
Zi-hao-Wei / Efficient-Vision-Language-Pre-training-by-Cluster-Masking
View on GitHub
[CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.
☆33May 16, 2024Updated 2 years ago
Tencent-QQMM / PureMM
View on GitHub
☆21Feb 29, 2024Updated 2 years ago
OpenGVLab / LCL
View on GitHub
[NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
☆72Feb 11, 2025Updated last year
ytaek-oh / fsc-clip
View on GitHub
[EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
☆22Oct 8, 2024Updated last year
TerryPei / CSP
View on GitHub
Cross-Self KV Cache Pruning for Efficient Vision-Language Inference
☆10Dec 15, 2024Updated last year
Wangpeiyi9979 / HCL-Text2AMR
View on GitHub
Code for ACL22 short Paper "Hierarchical Curriculum Learning for AMR Parsing"
☆13Jun 1, 2022Updated 4 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
bytedance / UniVR
View on GitHub
☆24Jul 16, 2026Updated last week
Kyyle2114 / Convolutional-Adapter-for-Segment-Anything
View on GitHub
CAD - Memory Efficient Convolutional Adapter for Segment Anything
☆12Oct 4, 2024Updated last year
Jaiy / Ground-aware-Seg
View on GitHub
Ground-Aware Point Cloud Semantic Segmentation for Autonomous Driving. ACM Multimedia 2019.
☆12Sep 19, 2019Updated 6 years ago
JCruan519 / iDAT
View on GitHub
(ICME24) This is the offical repository of iDAT: inverse Distillation Adapter-Tuning.
☆13Apr 3, 2024Updated 2 years ago
xzc-zju / AdaVideoRAG
View on GitHub
[NeurIPS 2025] AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding
☆15Jun 16, 2025Updated last year
kyegomez / MultiQueryAttention
View on GitHub
This is a simple torch implementation of the high performance Multi-Query Attention
☆16Aug 23, 2023Updated 2 years ago
meteorshowers / Sora-Generates-Videos-with-Stunning-Geometrical-Consistency
View on GitHub
Sora Generates Videos with Stunning Geometrical Consistency
☆51Mar 24, 2024Updated 2 years ago