Euphoria16 / UI-GenieLinks
☆27Updated this week
Alternatives and similar repositories for UI-Genie
Users that are interested in UI-Genie are comparing it to the libraries listed below
Sorting:
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 11 months ago
- ☆43Updated 5 months ago
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆29Updated last month
- ☆81Updated 2 months ago
- Fast-Slow Thinking for Large Vision-Language Model Reasoning☆14Updated last month
- [CVPR 2024] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model☆17Updated last year
- Multimodal RewardBench☆39Updated 3 months ago
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆60Updated 3 months ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆55Updated 9 months ago
- The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…☆19Updated last year
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆86Updated 8 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆40Updated 3 months ago
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆45Updated last year
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆47Updated 5 months ago
- Paper List for In-context Learning 🌷☆20Updated 2 years ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆31Updated 3 months ago
- ☆117Updated last year
- ☆77Updated 4 months ago
- ☆29Updated 8 months ago
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆44Updated 2 weeks ago
- [NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs☆18Updated 7 months ago
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training☆46Updated 2 months ago
- ☆99Updated last year
- ☆30Updated 10 months ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Updated last year
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆103Updated last week
- [NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆69Updated 3 months ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆34Updated 6 months ago
- ☆39Updated this week
- [ICLR'25] Reconstructive Visual Instruction Tuning☆89Updated last month