guanhaisu / OBSD
[ACL 2024 Best Paper] Deciphering Oracle Bone Language with Diffusion Models
☆65Updated 3 weeks ago
Related projects: ⓘ
- Oracle Bone Script data collected by VLRLab of HUST☆25Updated 2 weeks ago
- AI-assisted Deciphering Oracle Bone Script☆27Updated this week
- Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆44Updated 3 months ago
- This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆138Updated 5 months ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆86Updated 4 months ago
- ☆70Updated 6 months ago
- A collection of visual instruction tuning datasets.☆74Updated 6 months ago
- Official repository of MMDU dataset☆61Updated last month
- ChartMimic: Evaluating LMM’s Cross-Modal Reasoning Capability via Chart-to-Code Generation☆80Updated 2 months ago
- ☆37Updated 3 months ago
- EVE: Encoder-Free Vision-Language Models☆207Updated 2 months ago
- The official implementation of RAR☆61Updated 5 months ago
- A paper collection of recent diffusion models for text-image generation tasks, e,g., visual text generation, font generation, text remova…☆186Updated last month
- The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".☆151Updated this week
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆239Updated 2 months ago
- Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models☆76Updated 2 weeks ago
- A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo☆31Updated last month
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆75Updated 5 months ago
- [ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions☆112Updated 2 months ago
- [CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allo…☆255Updated 3 weeks ago
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆57Updated 3 months ago
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆51Updated 8 months ago
- Official implementation of "Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer".☆120Updated last year
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆52Updated 5 months ago
- ☆104Updated 2 months ago
- This repository is the code of our paper "DiffUTE: Universal Text Editing Diffusion Model" (NeurIPS'2023).☆119Updated 2 months ago
- [CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding☆177Updated 2 months ago
- LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft☆37Updated 2 months ago
- A RLHF Infrastructure for Vision-Language Models☆86Updated 3 months ago
- Official implementation for ICDAR 2024 Oral paper "ICAL: Implicit Character-Aided Learning for Enhanced Handwritten Mathematical Expressi…☆16Updated last month