ForJadeForest / Lever-LM
The Code for Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models
☆10Updated last month
Related projects ⓘ
Alternatives and complementary repositories for Lever-LM
- An in-context learning research testbed☆10Updated 2 weeks ago
- This is the first released survey paper on hallucinations of large vision-language models (LVLMs). To keep track of this field and contin…☆46Updated 3 months ago
- [NeurIPS2023] Exploring Diverse In-Context Configurations for Image Captioning☆27Updated 4 months ago
- A Survey on Interpretable Cross-modal Reasoning☆13Updated last year
- ☆36Updated 2 months ago
- 😎 up-to-date & curated list of awesome LMM hallucinations papers, methods & resources.☆144Updated 7 months ago
- mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating☆79Updated 9 months ago
- [ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"☆70Updated 5 months ago
- The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆178Updated 7 months ago
- SotA text-only image/video method (IJCAI 2023)☆12Updated 10 months ago
- [IJCAI 2022] Official Pytorch code for paper “S2 Transformer for Image Captioning”☆80Updated 2 months ago
- [CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding☆207Updated last month
- CHAIR metric is a rule-based metric for evaluating object hallucination in caption generation.☆22Updated last year
- Update 2020☆72Updated 2 years ago
- [ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models☆134Updated 6 months ago
- Papers about Hallucination in Multi-Modal Large Language Models (MLLMs)☆55Updated 2 months ago
- [EMNLP 2024 Findings] The official PyTorch implementation of EchoSight: Advancing Visual-Language Models with Wiki Knowledge.☆35Updated 3 weeks ago
- Repository for an end-to-end image captioning method PTSN(ACM MM22).☆60Updated last year
- [Paper][AAAI2024]Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations☆113Updated 4 months ago
- The reinforcement learning codes for dataset SPA-VL☆20Updated 4 months ago
- A RLHF Infrastructure for Vision-Language Models☆98Updated 5 months ago
- This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vi…☆81Updated 3 weeks ago
- M-HalDetect Dataset Release☆19Updated last year
- Visualizing the attention of vision-language models☆66Updated 2 weeks ago
- ☆79Updated 2 years ago
- [TCSVT23] Official code for "SPT: Spatial Pyramid Transformer for Image Captioning".☆10Updated 2 months ago
- [TIP 2022] Official code of paper “Video Question Answering with Prior Knowledge and Object-sensitive Learning”☆46Updated 9 months ago
- [CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(…☆242Updated last week
- Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal …☆27Updated last week
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆65Updated 9 months ago