GaiZhenbiao / Phi3V-Finetuning
Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.
☆48Updated 3 months ago
Related projects: ⓘ
- ☆79Updated this week
- ☆55Updated 2 months ago
- (WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, B…☆77Updated last week
- InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)☆134Updated 3 months ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆138Updated last week
- a family of highly capabale yet efficient large multimodal models☆155Updated 3 weeks ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆64Updated 2 weeks ago
- ☆50Updated 2 months ago
- Source code of the paper: RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering [F…☆57Updated 3 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆148Updated 2 months ago
- From scratch implementation of a vision language model in pure PyTorch☆149Updated 4 months ago
- Quick exploration into fine tuning florence 2☆250Updated last month
- Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.☆56Updated 5 months ago
- ☆35Updated last year
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆67Updated 2 months ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆118Updated last week
- FuseAI Project☆75Updated 3 weeks ago
- [TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.☆224Updated 8 months ago
- A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, qwen-vl, phi3-v …☆123Updated last week
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…☆77Updated 3 months ago
- ControlLLM: Augment Language Models with Tools by Searching on Graphs☆184Updated 2 months ago
- [NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence☆41Updated last week
- My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"☆68Updated last week
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆54Updated last month
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆85Updated 5 months ago
- Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"☆252Updated 3 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆123Updated 6 months ago
- A streamlit app for visualizing LLM evals.☆38Updated 8 months ago
- ☆65Updated last year
- Official implementation for the paper "LongEmbed: Extending Embedding Models for Long Context Retrieval"☆108Updated 4 months ago