NExT-GPT / NExT-GPT.github.io
NExT-GPT: Any-to-Any Multimodal Large Language Model
☆19Updated last week
Related projects ⓘ
Alternatives and complementary repositories for NExT-GPT.github.io
- ☆13Updated last year
- ☆57Updated last month
- The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"☆25Updated this week
- ☆43Updated last month
- Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…☆27Updated 2 weeks ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆37Updated 6 months ago
- ☆38Updated last year
- ☆36Updated last month
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated 7 months ago
- ☆58Updated 4 months ago
- My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't rel…☆11Updated 9 months ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆16Updated this week
- ☆88Updated 5 months ago
- The AI Radiologist You Can Chat With☆18Updated last year
- Official repo for StableLLAVA☆90Updated 10 months ago
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models☆122Updated 2 weeks ago
- A curated list of the papers, repositories, tutorials, and anythings related to the large language models for tools☆64Updated last year
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…☆45Updated last month
- Visual RAG using less than 300 lines of code.☆23Updated 8 months ago
- The Next Generation Multi-Modality Superintelligence☆70Updated 2 months ago
- Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning with…☆22Updated last week
- Official implementation of ECCV24 paper: POA☆24Updated 3 months ago
- A chatbot UI for RAG, multimodal, text completion. (support Transformers, llama.cpp, MLX, vLLM)☆18Updated 6 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated 8 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆37Updated 3 weeks ago
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges☆30Updated last year
- Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆113Updated last month
- ☆24Updated 3 weeks ago
- ☆65Updated last year
- ☆14Updated last year