UniModal4Reasoning / DocGenome

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models

☆134

Related projects ⓘ

Alternatives and complementary repositories for DocGenome

mlpc-ucsd / BLIVA
(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
☆270Updated 7 months ago
gordonhu608 / MQT-LLaVA
[NeurIPS 2024] Matryoshka Query Transformer for Large Vision-Language Models
☆97Updated 4 months ago
HITsz-TMG / UMOE-Scaling-Unified-Multimodal-LLMs
The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"
☆771Updated 2 months ago
UniModal4Reasoning / AdaptiveDiffusion
[NeurIPS'24] Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
☆49Updated 2 weeks ago
mragbench / MRAG-Bench
Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
☆29Updated 2 weeks ago
MCG-NJU / AWT
[NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
☆79Updated last month
dle666 / R-CoT
Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
☆131Updated 2 weeks ago
ZrrSkywalker / MathVerse
[ECCV 2024] Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
☆149Updated 2 months ago
yuanze-lin / REVIVE
[NeurIPS 2022] Official Code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
☆134Updated 2 months ago
longyuewangdcu / GuoFeng-Webnovel
Multilingual Corpus of Web Fiction
☆216Updated 4 months ago
AlaaLab / InstructCV
[ ICLR 2024 ] Official Codebase for "InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists"
☆520Updated 6 months ago
om-ai-lab / OmDet
Real-time and accurate open-vocabulary end-to-end object detection
☆1,534Updated 2 months ago
yuanze-lin / Learnable_Regions
[CVPR 2024] Official code for "Text-Driven Image Editing via Learnable Regions"
☆266Updated last month
UniModal4Reasoning / ChartVLM
Official Repository of ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning
☆211Updated last month
jiaweizzhao / InRank
☆212Updated 10 months ago
gersteinlab / ML-Bench
The Official Repo of ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code (https://a…
☆355Updated this week
tencent-ailab / Leopard
The repository for the paper titled "Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks"
☆184Updated 3 weeks ago
Hawkeye-FineGrained / Hawkeye
Open source deep learning based fine-grained image recognition toolbox built on PyTorch🔥
☆576Updated 6 months ago
hustCYQ / MVP-PCLIP
The Official Implementation for ''Towards Zero-shot Point Cloud Anomaly Detection: A Multi-View Projection Framework''
☆45Updated 2 weeks ago
ShareGPT4Omni / ShareGPT4Video
[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
☆1,265Updated last month
Langboat / Mengzi3
☆2,031Updated last month
MFaceTech / HyperDreamBooth
☆109Updated 8 months ago
Yuliang-Liu / Monkey
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
☆1,831Updated last week
Zefan-Cai / KVCache-Factory
Unified KV Cache Compression Methods for LLMs
☆767Updated this week
duguodong7 / model-evolution
[ACL 2024] Knowledge Fusion by Evolving Weights of Language Models
☆55Updated 2 months ago
OPPOMKLab / u-LLaVA
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
☆138Updated 4 months ago
FoundationVision / Groma
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
☆566Updated 5 months ago
zibojia / COCOCO
Video-Inpaint-Anything: This is the inference code for our paper CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, C…
☆285Updated 2 months ago
dvlab-research / LLMGA
This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024 Oral
☆463Updated 3 months ago
PKU-YuanGroup / Machine-Mindset
An MBTI Exploration of Large Language Models
☆475Updated 9 months ago