AIDC-AI / Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
☆526Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for Ovis
- Official repository for the paper PLLaVA☆593Updated 3 months ago
- A family of lightweight multimodal models.☆933Updated this week
- Next-Token Prediction is All You Need☆1,824Updated 3 weeks ago
- Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation☆675Updated 3 months ago
- Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding☆559Updated last month
- Janus-Series: Unified Multimodal Understanding and Generation Models☆1,084Updated last week
- NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing☆385Updated last month
- ☆201Updated 3 weeks ago
- Multimodal Models in Real World☆403Updated 3 weeks ago
- LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images☆318Updated last month
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆705Updated 9 months ago
- VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs☆877Updated last week
- A Framework of Small-scale Large Multimodal Models☆652Updated last month
- ✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM☆964Updated 3 weeks ago
- A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision,…☆179Updated 3 weeks ago
- Long Context Transfer from Language to Vision☆334Updated 3 weeks ago
- HPT - Open Multimodal LLMs from HyperGAI☆312Updated 5 months ago
- Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"☆851Updated 8 months ago
- ☆763Updated this week
- Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks☆1,361Updated this week
- OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆274Updated this week
- ☆278Updated 2 weeks ago
- 🔥🔥First-ever hour scale video understanding models☆166Updated 3 weeks ago
- Parsing-free RAG supported by VLMs☆388Updated this week
- ✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis☆406Updated 5 months ago
- Official implementation of SEED-LLaMA (ICLR 2024).☆579Updated 2 months ago
- ☆819Updated last month
- LLaVA-HR: High-Resolution Large Language-Vision Assistant☆212Updated 3 months ago
- Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊☆234Updated 2 weeks ago
- 🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)☆813Updated 4 months ago