taishan1994 / llava-handbookView external linksLinks
对llava官方代码的一些学习笔记
☆29Oct 11, 2024Updated last year
Alternatives and similar repositories for llava-handbook
Users that are interested in llava-handbook are comparing it to the libraries listed below
Sorting:
- 这是一个大学四年的cs基础课部分专业课的复习笔记的扫描版备份仓库☆12Jun 29, 2019Updated 6 years ago
- [ICLR 2024] This is the official implementation for the paper: "Beyond imitation: Leveraging fine-grained quality signals for alignment"☆10May 5, 2024Updated last year
- ☆13Aug 28, 2024Updated last year
- This is the official repository for VLN-CLASH.☆22Aug 5, 2025Updated 6 months ago
- Official Repository for CVPR 2024 Paper: "Large Language Models are Good Prompt Learners for Low-Shot Image Classification"☆41Jul 1, 2024Updated last year
- MSWAL☆13Nov 7, 2025Updated 3 months ago
- Multi-Person Tracking in Tour Guide Robot☆10Aug 23, 2022Updated 3 years ago
- About The corresponding code from our paper " Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning…☆13Jan 14, 2026Updated last month
- [COLING 2025 Industry] LoRA Soups☆18Nov 29, 2024Updated last year
- ☆14Sep 17, 2024Updated last year
- Official Code Repository for the paper "Generating Realistic Images from In-the-wild Sounds", ICCV 2023☆12Aug 24, 2025Updated 5 months ago
- awesome-audio-visual-robustness☆11Jan 27, 2024Updated 2 years ago
- Fine-tuning Llama2-7b and other llms for categorising emails for Deutsche Bahn (German National Railways)☆13Oct 9, 2023Updated 2 years ago
- HearSight智能音视频内容分析工具,支持多源视频(包括 URL和上传文件方式)导入能够从输入的视频源中提取上下文信息,从而提供更精准的 AI问答交互。平台基于视频语义单元进行智能切片,用户可通过问答方式灵活调整切片维度,快速定位所需内容同时,HearSight支持自动…☆32Dec 12, 2025Updated 2 months ago
- Speech Security and Privacy Compendium - Mini☆10Jun 18, 2024Updated last year
- 🕵️♂️🔊 Automatically update Audio Deepfake Detection (ADD) papers daily using GitHub Actions (updates every 12 hours)☆17Updated this week
- Plamber is the multiplatform web-oriented system for reading and storing books online.☆10Dec 8, 2022Updated 3 years ago
- ☆19May 28, 2025Updated 8 months ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆204Jul 17, 2025Updated 7 months ago
- LLM-MapBook: AI-Powered Maps for Storytelling. Extracts geo-coordinates from books, visualizes on interactive maps, offering immersive st…☆12Aug 27, 2024Updated last year
- ☆13Jan 15, 2023Updated 3 years ago
- The codes of our paper "EasyInv: Toward Fast and Better DDIM Inversion"☆14Jun 1, 2025Updated 8 months ago
- ☆12Mar 27, 2025Updated 10 months ago
- [NAACL 2024] Z-GMOT: Zero-shot Generic Multiple Object Tracking☆13May 3, 2024Updated last year
- The official github repo for MixEval-X, the first any-to-any, real-world benchmark.☆16Feb 15, 2025Updated last year
- Tracking Multiple Deformable Objects in Egocentric Videos (CVPR 2023)☆13Apr 10, 2023Updated 2 years ago
- This repository includes the code to reproduce our paper "Raw Differentiable Architecture Search for Speech Deepfake and Spoofing Detecti…☆11Jul 11, 2023Updated 2 years ago
- Official code for CVPR 2024 paper, "Audio-Visual Segmentation via Unlabeled Frame Exploitation""☆18Jul 7, 2024Updated last year
- Fast instruction tuning with Llama2☆11Apr 8, 2024Updated last year
- ☆13Apr 5, 2023Updated 2 years ago
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…☆17Mar 3, 2025Updated 11 months ago
- Get CLIP ViT text tokens about an image, visualize attention as a heatmap.☆15Aug 8, 2023Updated 2 years ago
- Papers of "A Survey on Multimodal LLMs from the Perspective of Input-Output Space Extension"☆16Feb 4, 2026Updated last week
- I fine-tuned (p-tuning) Tsinghua’s open-source large language model, ChatGLM2-6B, using several years of my WeChat chat history. Inspired…☆12Mar 6, 2024Updated last year
- ICL backdoor attack☆17Nov 4, 2024Updated last year
- Automatic Metric for Evaluating Generated Videos☆32Dec 8, 2025Updated 2 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆58Jun 27, 2023Updated 2 years ago
- [IEEE TIP 2024] Facial Prior Guided Micro-Expression Generation☆13Nov 8, 2024Updated last year
- [ACMMM2025] Official released code for ALLM4ADD☆36Oct 30, 2025Updated 3 months ago