BeingBeyond / Being-VL-0.5Links

Being-VL-0.5: Unified Multimodal Understanding via Byte-Pair Visual Encoding

☆23

Alternatives and similar repositories for Being-VL-0.5

Users that are interested in Being-VL-0.5 are comparing it to the libraries listed below

Sorting:

NVlabs / QLIP
[arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
☆86Updated 6 months ago
ChenYi99 / EgoPlan
☆71Updated 9 months ago
alanaai / EVUD
Egocentric Video Understanding Dataset (EVUD)
☆31Updated last year
Gabesarch / grounded-rl
☆80Updated last month
LargeWorldModel / ElasticTok
ElasticTok: Adaptive Tokenization for Image and Video
☆75Updated 10 months ago
Haochen-Wang409 / ross
[ICLR'25] Reconstructive Visual Instruction Tuning
☆106Updated 4 months ago
BolinLai / LEGO
[ECCV2024, Oral, Best Paper Finalist] This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation…
☆37Updated 6 months ago
allenai / unified-io-2.pytorch
☆77Updated last year
linkangheng / Video-UTR
[ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs
☆58Updated 6 months ago
ML-GSAI / LLaDA-V
☆218Updated 3 weeks ago
AILab-CVC / VL-GPT
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
☆86Updated 11 months ago
facebookresearch / metamorph
Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning
☆207Updated 4 months ago
TencentARC / Moto
[ICCV2025 Oral] Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
☆122Updated 3 months ago
mshukor / ima-lmms
[NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
☆21Updated 10 months ago
Open-Reasoner-Zero / Open-Vision-Reasoner
The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning".
☆137Updated last month
rese1f / aurora
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
☆125Updated 3 months ago
Ahnsun / merlin
[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
☆94Updated last year
thuml / iVideoGPT
Official repository for "iVideoGPT: Interactive VideoGPTs are Scalable World Models" (NeurIPS 2024), https://arxiv.org/abs/2405.15223
☆141Updated 3 months ago
OpenGVLab / VeBrain
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces
☆80Updated 2 months ago
yongliu20 / Awesome-Unified-Understanding-and-Generation
☆48Updated 2 weeks ago
RifleZhang / LLaVA-Hound-DPO
☆153Updated 10 months ago
NVlabs / Long-RL
Long-RL: Scaling RL to Long Sequences
☆597Updated 2 weeks ago
mit-han-lab / vila-u
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
☆382Updated 4 months ago
longvideobench / LongVideoBench
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆107Updated last year
Fr0zenCrane / UniCoT
Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
☆101Updated 3 weeks ago
bigai-nlco / VideoTGB
[EMNLP 2024] A Video Chat Agent with Temporal Prior
☆32Updated 6 months ago
egolife-ai / Ego-R1
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
☆110Updated 2 weeks ago
TencentARC / SEED-Bench-R1
☆88Updated 2 months ago
huiwon-jang / CoordTok
☆38Updated 6 months ago
USC-GVL / PhysBench
[ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …
☆67Updated 3 months ago