zhijie-group / Orthus
☆28Updated 3 weeks ago
Alternatives and similar repositories for Orthus:
Users that are interested in Orthus are comparing it to the libraries listed below
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆46Updated 4 months ago
- V1: Toward Multimodal Reasoning by Designing Auxiliary Task☆34Updated 3 weeks ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆53Updated 2 weeks ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆55Updated 8 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆84Updated this week
- ✈️ Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints☆66Updated last month
- This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality"☆46Updated last month
- Unifying Visual Understanding and Generation with Dual Visual Vocabularies 🌈☆43Updated 3 weeks ago
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆55Updated 9 months ago
- [NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs☆18Updated 6 months ago
- ☆40Updated 4 months ago
- ☆75Updated 4 months ago
- [EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…☆93Updated 5 months ago
- Code release for VTW (AAAI 2025) Oral☆37Updated 3 months ago
- Official repository for CoMM Dataset☆33Updated 4 months ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆56Updated 7 months ago
- A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.☆53Updated 3 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆65Updated 11 months ago
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆86Updated 6 months ago
- ☆44Updated this week
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆73Updated 10 months ago
- ☆25Updated 11 months ago
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆28Updated 5 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆100Updated 2 months ago
- ☆43Updated last month
- SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆99Updated 2 weeks ago
- [ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Cont…☆35Updated 5 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆58Updated last year
- ☆30Updated 9 months ago
- Official implement of MIA-DPO☆56Updated 3 months ago