Gary-code / Machine-Learning-Park
机器学习乐园:主要包括机器学习基础,深度学习实践,工业应用。
☆14Updated 2 years ago
Alternatives and similar repositories for Machine-Learning-Park:
Users that are interested in Machine-Learning-Park are comparing it to the libraries listed below
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆24Updated 3 months ago
- ✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).☆39Updated last week
- (ICML 2024) Improve Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning☆28Updated 6 months ago
- VisRL: Intention-Driven Visual Perception via Reinforced Reasoning☆20Updated 2 weeks ago
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆45Updated 5 months ago
- Code for paper: Unified Text-to-Image Generation and Retrieval☆14Updated 8 months ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆16Updated 5 months ago
- 😎 基于知识的文本生成相关文章总结与个人笔记☆21Updated 5 months ago
- Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'☆13Updated 8 months ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Updated last year
- Codes for Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM☆53Updated 5 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆57Updated 9 months ago
- (ICLR2025 Spotlight) DEEM: Official implementation of Diffusion models serve as the eyes of large language models for image perception.☆27Updated 3 weeks ago
- Counterfactual Reasoning VQA Dataset☆24Updated last year
- An automatic MLLM hallucination detection framework☆19Updated last year
- ☆54Updated 3 weeks ago
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆51Updated 4 months ago
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆32Updated last year
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆64Updated last year
- ☆41Updated last month
- 🔎Official code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".☆30Updated 2 weeks ago
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆48Updated 4 months ago
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆40Updated 4 months ago
- ☆25Updated 10 months ago
- ☆18Updated 8 months ago
- NegCLIP.☆32Updated 2 years ago
- A Self-Training Framework for Vision-Language Reasoning☆73Updated 2 months ago
- [ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives☆30Updated 5 months ago
- Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Capti…☆17Updated 2 months ago
- [ICLR 2023] This is the code repo for our ICLR‘23 paper "Universal Vision-Language Dense Retrieval: Learning A Unified Representation Spa…☆50Updated 9 months ago