Gary-code / Machine-Learning-ParkLinks
机器学习乐园:主要包括机器学习基础,深度学习实践,工业应用。
☆15Updated 2 years ago
Alternatives and similar repositories for Machine-Learning-Park
Users that are interested in Machine-Learning-Park are comparing it to the libraries listed below
Sorting:
- Exploring and mitigating semantic hallucinations in scene text perception and reasoning☆11Updated last month
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆46Updated this week
- Code for paper: Unified Text-to-Image Generation and Retrieval☆15Updated last year
- 😎 基于知识的文本生成相关文章总结与个人笔记☆21Updated 9 months ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆46Updated 6 months ago
- (ICLR2025 Spotlight) DEEM: Official implementation of Diffusion models serve as the eyes of large language models for image perception.☆35Updated 2 weeks ago
- Codes for ICLR 2025 Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM☆67Updated 2 months ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Updated last year
- Official implement of MIA-DPO☆59Updated 5 months ago
- ☆27Updated 8 months ago
- An automatic MLLM hallucination detection framework☆19Updated last year
- ☆14Updated 2 months ago
- Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition☆31Updated 2 months ago
- ☆54Updated 4 months ago
- Counterfactual Reasoning VQA Dataset☆25Updated last year
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆54Updated 8 months ago
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆33Updated last year
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆65Updated last month
- NegCLIP.☆33Updated 2 years ago
- ☆18Updated last year
- ✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).☆49Updated 3 months ago
- A instruction data generation system for multimodal language models.☆33Updated 5 months ago
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆74Updated 7 months ago
- [SIGIR 2024] - Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval☆39Updated last year
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆51Updated last year
- [ICLR 2023] This is the code repo for our ICLR‘23 paper "Universal Vision-Language Dense Retrieval: Learning A Unified Representation Spa…☆51Updated last year
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆81Updated 10 months ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆85Updated last year
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆117Updated 3 weeks ago
- [ICCV 2025] VisRL: Intention-Driven Visual Perception via Reinforced Reasoning☆33Updated last month