FuxiaoLiu / DocumentCLIPLinks

[ICPRAI 2024] DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents

☆16

Alternatives and similar repositories for DocumentCLIP

Users that are interested in DocumentCLIP are comparing it to the libraries listed below

Sorting:

usydnlp / vdoc
☆15Updated 3 years ago
umd-huang-lab / Mementos
☆31Updated last year
HYPJUDY / Sparkles
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
☆44Updated last year
M3-IT / YING-VLM
Vision Large Language Models trained on M3IT instruction tuning dataset
☆17Updated 2 years ago
psunlpgroup / VisOnlyQA
This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…
☆27Updated 4 months ago
nttmdlab-nlp / SlideVQA
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)
☆102Updated 8 months ago
SihengLi99 / TextBind
[2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation
☆47Updated 2 years ago
NExTplusplus / TAT-DQA
TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning
☆24Updated last year
OFA-Sys / TouchStone
Touchstone: Evaluating Vision-Language Models by Language Models
☆83Updated last year
mlfoundations / VisIT-Bench
☆50Updated 2 years ago
shizhediao / DaVinci
Source code for the paper "Prefix Language Models are Unified Modal Learners"
☆43Updated 2 years ago
TobiasLee / VEC
Visual and Embodied Concepts evaluation benchmark
☆21Updated 2 years ago
wade3han / champagne
An official codebase for paper " CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos (ICCV 23)"
☆52Updated 2 years ago
Hxyou / IdealGPT
Official Code of IdealGPT
☆35Updated 2 years ago
NiteshMethani / PlotQA
Dataset introduced in PlotQA: Reasoning over Scientific Plots
☆82Updated 2 years ago
OpenGVLab / Awesome-LLM4Tool
A curated list of the papers, repositories, tutorials, and anythings related to the large language models for tools
☆68Updated 2 years ago
YunxinLi / LingCloud
Attaching human-like eyes to the large language model. The codes of IEEE TMM paper "LMEye: An Interactive Perception Network for Large La…
☆48Updated last year
hsiehjackson / Mr.Right
Mr. Right: Multimodal Retrieval on Representation of ImaGe witH Text
☆24Updated 3 years ago
DAMO-NLP-SG / SSTuning
Code for ACL paper "Zero-Shot Text Classification via Self-Supervised Tuning"
☆27Updated 2 years ago
Yangyi-Chen / CoTConsistency
The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".
☆34Updated 2 years ago
gregor-ge / mBLIP
☆87Updated last year
CONE-MT / LLaMAX
☆72Updated 11 months ago
YujieLu10 / TIP
Multimodal-Procedural-Planning
☆92Updated 2 years ago
huggingface / docmatix
A huge dataset for Document Visual Question Answering
☆20Updated last year
neulab / MultiUI
Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding
☆53Updated 11 months ago
showlab / Awesome-Long-Context
A curated list of resources about long-context in large-language models and video understanding.
☆31Updated 2 years ago
open-vision-language / oven
☆40Updated 2 years ago
lupantech / IconQA
Data and code for NeurIPS 2021 Paper "IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning".
☆53Updated last year
dqxiu / KAssess
☆14Updated 2 years ago
bytedance / MTVQA
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…
☆64Updated 6 months ago