opendatalab / LOKI
The official implementation of the paper “LOKI:A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models”
☆110Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for LOKI
- The official implementation of the paper “Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm”☆28Updated 3 weeks ago
- This is the repo for the paper Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining.☆31Updated last month
- The official pytorch implementation of Exploring the Interactive Guidance for Unified and Effective Image Matting☆23Updated 7 months ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆98Updated last week
- [ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions☆157Updated 4 months ago
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆148Updated last month
- Explore the Limits of Omni-modal Pretraining at Scale☆89Updated 2 months ago
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆61Updated last month
- Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).☆148Updated last month
- Official implementation of the Law of Vision Representation in MLLMs☆134Updated this week
- The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A su…☆181Updated 3 weeks ago
- AAAI 2024: Visual Instruction Generation and Correction☆90Updated 9 months ago
- A paper list of some recent works about Token Compress for Vit and VLM☆142Updated this week
- ☆105Updated 3 months ago
- 📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.☆217Updated 2 weeks ago
- 🔥🔥First-ever hour scale video understanding models☆169Updated 3 weeks ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆137Updated 2 weeks ago
- The official implementation of the paper: Sat2Density: Faithful Density Learning from Satellite-Ground Image Pairs (ICCV 2023)☆40Updated 5 months ago
- [CVPR 2024🔥] Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization☆95Updated 5 months ago
- The paper collections for the autoregressive models in vision.☆231Updated this week
- The official implementation of "Segment Anything with Multiple Modalities".☆67Updated 2 months ago
- Awesome lists about framework figures in papers☆52Updated last month
- ✨✨ MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆78Updated last week
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆100Updated 6 months ago
- About The official implementation of the paper "Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network“. (ECCV 2024)☆35Updated 3 weeks ago
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model☆133Updated 3 months ago
- [ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs☆71Updated 2 weeks ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆132Updated last month
- [ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…☆277Updated 3 months ago
- ☆23Updated 3 months ago