TobyYang7 / Llava_Qwen2View external linksLinks
Visual Instruction Tuning for Qwen2 Base Model
☆41Jun 29, 2024Updated last year
Alternatives and similar repositories for Llava_Qwen2
Users that are interested in Llava_Qwen2 are comparing it to the libraries listed below
Sorting:
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆44Apr 18, 2025Updated 9 months ago
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆47Nov 10, 2024Updated last year
- ☆11Feb 6, 2026Updated last week
- Offical implementation of "Efficient 3D Recognition with Event-driven Spike Sparse Convolution" (AAAI2025)☆25Jul 7, 2025Updated 7 months ago
- 🔥 🔥 [WACV2024] Mini but Mighty: Finetuning ViTs with Mini Adapters☆20Jul 5, 2024Updated last year
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆340Apr 20, 2025Updated 9 months ago
- Official implementation of "MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model". Our co…☆25Dec 20, 2024Updated last year
- The first decoder-only multimodal state space model☆100May 19, 2025Updated 8 months ago
- ☆24Dec 26, 2024Updated last year
- ☆28Aug 25, 2024Updated last year
- [AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"☆62Dec 8, 2025Updated 2 months ago
- [ICML 2025] Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in…☆171Sep 25, 2025Updated 4 months ago
- [ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models☆49Jul 7, 2025Updated 7 months ago
- Difyで作る生成AIアプリ完全入門☆17May 25, 2025Updated 8 months ago
- [ICLR 2025] Official PyTorch Implementation for CPE: Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Ga…☆12Apr 7, 2025Updated 10 months ago
- [ICRA 2024] WLST: Weak Labels Guided Self-training for Weakly-supervised Domain Adaptation on 3D Object Detection☆12Feb 6, 2024Updated 2 years ago
- A simple WeChat Official Account layout tool based on Dify☆16Jun 27, 2025Updated 7 months ago
- 训练一个对中文支持更好的LLaVA模型,并开源训练代码和数据。☆79Sep 6, 2024Updated last year
- PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…☆39Dec 5, 2025Updated 2 months ago
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆33Oct 12, 2024Updated last year
- [AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models☆38Jan 27, 2026Updated 2 weeks ago
- [ICCVW 25] LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning☆159Aug 8, 2025Updated 6 months ago
- A full-stack AI-powered business intelligence tool for non-experts, featuring serverless backend processing and a secure Streamlit fronte…☆25Jan 6, 2026Updated last month
- ☆28Dec 4, 2025Updated 2 months ago
- Implementation of the CVPR2025 paper LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty.☆16Sep 10, 2025Updated 5 months ago
- ☆11Aug 29, 2025Updated 5 months ago
- 2024年第十五届蓝桥杯全国总决赛 Python A组全国一等奖纪念(题目+考场代码)☆13Jan 10, 2025Updated last year
- 100 Production-Ready Claude Code Skills - The most comprehensive collection of AI skills for sales, business automation, content creation…☆35Oct 22, 2025Updated 3 months ago
- Write the database metadata into the dify knowledge☆12Dec 30, 2025Updated last month
- Workflow automation, but you just describe what you want and it happens.☆26Nov 22, 2025Updated 2 months ago
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆113Dec 12, 2025Updated 2 months ago
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆85Nov 10, 2024Updated last year
- [ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs☆163Nov 6, 2024Updated last year
- A Framework of Small-scale Large Multimodal Models☆960Feb 7, 2026Updated last week
- 🔥Awesome Multimodal Large Language Models Paper List☆154Mar 12, 2025Updated 11 months ago
- [ICCV 2025] p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay☆43Jun 26, 2025Updated 7 months ago
- (ICLR 2026 🔥) Code for "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs"☆73Feb 1, 2026Updated 2 weeks ago
- ☆13Jul 3, 2024Updated last year
- 用Kinect2.0读取图像的深度等信息,分割出手部图像。用HOG提取手部图像信息,接着用SVM进行训练。目的是为了识别手势。☆10Jan 8, 2020Updated 6 years ago