dle666 / R-CoT
Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
☆129Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for R-CoT
- [ECCV 2024] Efficient Inference of Vision Instruction-Following Models with Elastic Cache☆46Updated 3 months ago
- 🚀 [NeurIPS24] Make Vision Matter in Visual-Question-Answering (VQA)! Introducing NaturalBench, a vision-centric VQA benchmark (NeurIPS'2…☆64Updated last week
- An Information Flow Perspective for Exploring Large Vision Language Models on Reasoning Tasks☆59Updated 3 weeks ago
- (NeurIPS 2024) Learning to Visual Question Answering, Asking and Assessment☆63Updated 2 weeks ago
- [MM'24 Oral] Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval☆138Updated 2 months ago
- ☆77Updated 4 months ago
- Chain-of-Spot: Interactive Reasoning Improves Large Vision-language Models☆86Updated 7 months ago
- (ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator☆108Updated last month
- [NeurIPS'24] Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation☆65Updated last week
- Domain-Controlled Prompt Learning (AAAI2024)☆114Updated this week
- Multi-granularity Correspondence Learning from Long-term Noisy Videos [ICLR 2024, Oral]☆108Updated 7 months ago
- u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model☆138Updated 4 months ago
- The repository for the paper titled "Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks"☆184Updated 3 weeks ago
- [CVPR 2024] Interactive continual learning: Fast and slow thinking☆120Updated 4 months ago
- WorldGPT: Empowering LLM as Multimodal World Model☆123Updated 3 months ago
- This is the official reproduction of Qihoo-T2X.☆297Updated 3 weeks ago
- Mixed precision inference by Tensorrt-LLM☆93Updated 3 weeks ago
- Support mixed-precsion inference with vllm☆95Updated 2 weeks ago
- Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasonin…☆161Updated this week
- Domain Prompt Learning with Quaternion Networks (CVPR2024 Highlight)☆108Updated this week
- The official code for "BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities"☆152Updated 3 weeks ago
- A comprehensive collection of resources focused on addressing and understanding hallucination phenomena in MLLMs.☆38Updated 6 months ago
- This tool(enhance_long) aims to enhance the LlaMa2 long context extrapolation capability in the lowest-cost approach, preferably without …☆47Updated 11 months ago
- Evaluation of Text-to-Video Generation Models: A Dynamics Perspective[NeurIPS 2024].☆337Updated 3 weeks ago
- ☆68Updated 3 months ago
- An open-source library with a powerful Contrastive Language-and-Motion (CLaM) pre-training evaluator☆127Updated 3 months ago
- This includes the original implementation of CtrlA: Adaptive Retrieval-Augmented Generation via Inherent Control.☆66Updated last month
- We leverage 14 datasets as OOD test data and conduct evaluations on 8 NLU tasks over 21 popularly used models. Our findings confirm that …☆117Updated last year
- [ICLR'24] Democratizing Fine-grained Visual Recognition with Large Language Models☆219Updated 4 months ago
- [NeurIPS 2024] EffiBench: Benchmarking the Efficiency of Automatically Generated Code☆57Updated last month