sky-goldfish / SAIL
[AAAI 2025] SAIL: Sample-Centric In-Context Learning for Document Information Extraction
☆16Updated 4 months ago
Alternatives and similar repositories for SAIL
Users that are interested in SAIL are comparing it to the libraries listed below
Sorting:
- Official codes for "Q-Ground: Image Quality Grounding with Large Multi-modality Models", ACM MM2024 (Oral)☆41Updated 6 months ago
- [ECCV 2024] Official Pytorch Implementation of A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment☆83Updated 9 months ago
- [CVPR 2025] Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution☆107Updated last month
- [Neurips 24 Spotlight] Training in Pairs + Inference on Single Image with Anchors☆38Updated 2 months ago
- A Token-level Text Image Foundation Model for Document Understanding☆91Updated 2 weeks ago
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆39Updated 7 months ago
- This repository is the codebase of TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy☆34Updated 7 months ago
- DepictQA: Depicted Image Quality Assessment with Vision Language Models☆141Updated 2 months ago
- A Survey of Multimodal Retrieval-Augmented Generation☆18Updated last month
- ☆38Updated last year
- [AAAI 2025] DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming☆20Updated 5 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆60Updated 6 months ago
- The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer☆53Updated 11 months ago
- PyTorch code for our paper "Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment"☆43Updated 5 months ago
- Official released code for VQA² series models☆43Updated 2 weeks ago
- [ACL 2024 Best Paper] Deciphering Oracle Bone Language with Diffusion Models☆104Updated last month
- 【ArXiv】PDF-Wukong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling☆116Updated 7 months ago
- Official implementation of ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining (AAAI 20…☆51Updated 10 months ago
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆192Updated 3 weeks ago
- The official project of paper "Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing"☆63Updated 2 weeks ago
- ☆15Updated last month
- 🔥Official PyTorch implementation for "LM4LV: A Frozen Large Language Model for Low-level Vision Tasks".☆51Updated 11 months ago
- Real-CE: A Benchmark for Chinese-English Scene Text Image Super-resolution (ICCV2023)☆86Updated last year
- ☆56Updated last month
- ☆19Updated last year
- Blind Quality Assessment for in-the-Wild Images via Hierarchical Feature Fusion and Iterative Mixed Database Training☆30Updated last year
- Building a VLM model starts from the basic module.☆16Updated last year
- AGIQA-1k-Database for AI Generated Content Image Quality Assessment☆27Updated 2 years ago
- ④[ECCV 2024 Oral, Comparison among Multiple Images!] A study on open-ended multi-image quality comparison: a dataset, a model and a bench…☆79Updated 7 months ago
- The official implementation of RAR☆87Updated last year