camenduru / LLaVA-colab
☆219Updated last year
Alternatives and similar repositories for LLaVA-colab:
Users that are interested in LLaVA-colab are comparing it to the libraries listed below
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆739Updated last year
- From scratch implementation of a vision language model in pure PyTorch☆214Updated last year
- ☆706Updated last year
- Examples of RAG using Llamaindex with local LLMs - Gemma, Mixtral 8x7B, Llama 2, Mistral 7B, Orca 2, Phi-2, Neural 7B☆126Updated last year
- ☆82Updated last year
- Fine Tuning Multimodal LLM "Idefics 9B" on Pokemon Go Dataset available on Hugging Face.☆19Updated last year
- Embed arbitrary modalities (images, audio, documents, etc) into large language models.☆184Updated last year
- Quick exploration into fine tuning florence 2☆309Updated 7 months ago
- Local LLM ReAct Agent with Guidance☆158Updated last year
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…☆80Updated 11 months ago
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Models☆278Updated last year
- LLaVA-Interactive-Demo☆369Updated 9 months ago
- 👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]☆614Updated last year
- InsightSolver: Colab notebooks for exploring and solving operational issues using deep learning, machine learning, and related models.☆98Updated 10 months ago
- ☆180Updated last year
- Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.☆58Updated 10 months ago
- ☆168Updated last year
- Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"☆266Updated 10 months ago
- llama.cpp with BakLLaVA model describes what does it see☆383Updated last year
- Example code for extracting Q&A datasets from LLM's☆83Updated last year
- Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first app…☆168Updated last year
- AI assistant that can query visual datasets, search the FiftyOne docs, and answer general computer vision questions☆245Updated 5 months ago
- A demonstration of a chatbot interface that uses the OpenAI ChatGPT API☆44Updated 2 years ago
- HPT - Open Multimodal LLMs from HyperGAI☆315Updated 11 months ago
- Chat with Phi 3.5/3 Vision LLMs. Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which includ…☆33Updated 4 months ago
- Building a chatbot powered with a RAG pipeline to read,summarize and quote the most relevant papers related to the user query.☆166Updated last year
- Maybe the new state of the art vision model? we'll see 🤷♂️☆163Updated last year
- Banishing LLM Hallucinations Requires Rethinking Generalization☆273Updated 9 months ago
- This repo contains codes covered in the youtube tutorials.☆84Updated 4 months ago
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆63Updated 8 months ago