UniModal4Reasoning / DocGenome
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models
☆92Updated last week
Related projects: ⓘ
- (AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions☆260Updated 5 months ago
- [ECCV 2024] Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?☆133Updated 2 weeks ago
- Official Repository of ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning☆208Updated last week
- u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model☆135Updated 2 months ago
- Mathematical Visual Instruction Tuning for Multi-modal Large Language Models☆86Updated last month
- [ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization☆537Updated 3 months ago
- Chain-of-Spot: Interactive Reasoning Improves Large Vision-language Models☆81Updated 5 months ago
- Multilingual Corpus of Web Fiction☆211Updated 2 months ago
- Matryoshka Query Transformer for Large Vision-Language Models☆88Updated 2 months ago
- (ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator☆77Updated 3 months ago
- This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024 Oral☆445Updated last month
- [ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA☆176Updated 2 weeks ago
- An open-source implementation for training LLaVA-NeXT.☆243Updated 3 months ago
- WorldGPT: Empowering LLM as Multimodal World Model☆116Updated last month
- We leverage 14 datasets as OOD test data and conduct evaluations on 8 NLU tasks over 21 popularly used models. Our findings confirm that …☆115Updated last year
- ☆189Updated 2 months ago
- [AAAI'24 Oral] LRANet: Towards Accurate and Efficient Scene Text Detection with Low-Rank Approximation Network☆29Updated 6 months ago
- [ECCV 2024] Efficient Inference of Vision Instruction-Following Models with Elastic Cache☆46Updated last month
- The Official Implementation of PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling☆480Updated last month
- [MM'24 Oral] Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval☆135Updated 3 weeks ago
- Video-Inpaint-Anything: This is the inference code for our paper CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, C…☆144Updated last week
- Real-time and accurate open-vocabulary end-to-end object detection☆1,483Updated last week
- FACTUAL benchmark dataset, the pre-trained textual scene graph parser trained on FACTUAL.☆96Updated last month
- The Official Repo of ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code (https://a…☆350Updated last week
- ☆166Updated last year
- An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions☆1,220Updated last month
- [ICCV 2023] Spectrum-guided Multi-granularity Referring Video Object Segmentation.☆78Updated 11 months ago
- Benchmarking LLMs via Uncertainty Quantification☆206Updated 7 months ago
- A multimodal agent framework for solving complex tasks☆505Updated last week
- [EMNLP 2023] FreeAL: Towards Human-Free Active Learning in the Era of Large Language Models☆81Updated 8 months ago