WayneJin0918 / SOTA-paper-rating.io
A tiny paper rating web
☆25Updated this week
Alternatives and similar repositories for SOTA-paper-rating.io:
Users that are interested in SOTA-paper-rating.io are comparing it to the libraries listed below
- ☆33Updated last week
- Official repo for "VisionZip: Longer is Better but Not Necessary in Vision Language Models"☆213Updated last week
- The paper collections for the autoregressive models in vision.☆350Updated 2 weeks ago
- 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆219Updated last week
- Empowering Unified MLLM with Multi-granular Visual Generation☆113Updated 2 months ago
- The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆77Updated 2 months ago
- ☆96Updated 3 weeks ago
- Official implementation for BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way☆19Updated 2 months ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆109Updated 7 months ago
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆160Updated 3 months ago
- Official Repository of Multi-Object Hallucination in Vision-Language Models (NeurIPS 2024)☆26Updated last month
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆131Updated this week
- Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey☆209Updated this week
- VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆194Updated 2 months ago
- Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference" proposed by Pekin…☆66Updated 2 months ago
- 📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.☆315Updated 2 weeks ago
- A collection of vision foundation models unifying understanding and generation.☆32Updated last week
- This is a repo to track the latest autoregressive visual generation papers.☆96Updated last week
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆104Updated 8 months ago
- Official implementation of the Law of Vision Representation in MLLMs☆145Updated last month
- [NAACL 2024] LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-text Generation?☆37Updated 7 months ago
- ☆39Updated last month
- Survey on Data-centric Large Language Models☆69Updated 6 months ago
- A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.☆15Updated 3 weeks ago
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆190Updated 2 weeks ago
- ✈️ Accelerating Vision Diffusion Transformers with Skip Branches.☆58Updated 3 weeks ago
- [CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs☆136Updated 5 months ago
- ☆74Updated 4 months ago
- [EMNLP 2024 Findings🔥] Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Infe…☆86Updated 2 months ago
- 📚 Collection of token reduction for model compression resources.☆18Updated this week