cvlab-columbia/viper

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/cvlab-columbia/viper)

cvlab-columbia / viper

Code for the paper "ViperGPT: Visual Inference via Python Execution for Reasoning"

☆1,716

Alternatives and similar repositories for viper

Users that are interested in viper are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

allenai / visprog
View on GitHub
Official code for VisProg (CVPR 2023 Best Paper!)
☆774Aug 26, 2024Updated last year
mlfoundations / open_flamingo
View on GitHub
An open-source framework for training large multimodal models.
☆4,115Aug 31, 2024Updated last year
salesforce / LAVIS
View on GitHub
LAVIS - A One-stop Library for Language-Vision Intelligence
☆11,254Jun 2, 2026Updated last month
sanjayss34 / codevqa
View on GitHub
☆83Jul 16, 2023Updated 3 years ago
OpenGVLab / LLaMA-Adapter
View on GitHub
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
☆5,917Mar 14, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
NVlabs / prismer
View on GitHub
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
☆1,311Jan 17, 2024Updated 2 years ago
haotian-liu / LLaVA
View on GitHub
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
☆24,937Aug 12, 2024Updated last year
microsoft / MM-REACT
View on GitHub
Official repo for MM-REACT
☆967Jan 31, 2024Updated 2 years ago
allenai / mmc4
View on GitHub
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
☆953Mar 19, 2025Updated last year
microsoft / unilm
View on GitHub
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
☆22,170Jan 23, 2026Updated 6 months ago
MikeWangWZHL / VidIL
View on GitHub
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
☆117Sep 15, 2022Updated 3 years ago
amazon-science / mm-cot
View on GitHub
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)
☆3,983Jun 12, 2024Updated 2 years ago
microsoft / X-Decoder
View on GitHub
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
☆1,346Oct 5, 2023Updated 2 years ago
kohjingyu / fromage
View on GitHub
🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".
☆484Oct 30, 2023Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
microsoft / GLIP
View on GitHub
Grounded Language-Image Pre-training
☆2,605Jan 24, 2024Updated 2 years ago
lupantech / chameleon-llm
View on GitHub
Codes for "Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models".
☆1,140Dec 23, 2023Updated 2 years ago
tatsu-lab / stanford_alpaca
View on GitHub
Code and documentation to train Stanford's Alpaca models, and generate the data.
☆30,251Jul 17, 2024Updated 2 years ago
facebookresearch / ImageBind
View on GitHub
ImageBind One Embedding Space to Bind Them All
☆9,061Nov 21, 2025Updated 8 months ago
chenfei-wu / TaskMatrix
View on GitHub
☆34,041Jan 6, 2024Updated 2 years ago
microsoft / JARVIS
View on GitHub
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
☆25,077Jul 29, 2025Updated 11 months ago
mlfoundations / open_clip
View on GitHub
An open source implementation of CLIP.
☆14,012Updated this week
Vision-CAIR / ChatCaptioner
View on GitHub
Official Repository of ChatCaptioner
☆468Apr 13, 2023Updated 3 years ago
FMInference / FlexLLMGen
View on GitHub
Running large language models on a single GPU for throughput-oriented scenarios.
☆9,361Oct 28, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
OpenGVLab / Ask-Anything
View on GitHub
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
☆3,344Updated this week
EvolvingLMMs-Lab / Otter
View on GitHub
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing imp…
☆3,425Mar 5, 2024Updated 2 years ago
Stability-AI / StableLM
View on GitHub
StableLM: Stability AI Language Models
☆15,687Apr 8, 2024Updated 2 years ago
baaivision / Emu
View on GitHub
Emu Series: Generative Multimodal Models from BAAI
☆1,776Jan 12, 2026Updated 6 months ago
IDEA-Research / Grounded-Segment-Anything
View on GitHub
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and …
☆17,688Sep 5, 2024Updated last year
Lightning-AI / lit-llama
View on GitHub
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Ad…
☆6,083Jul 1, 2025Updated last year
Vision-CAIR / MiniGPT-4
View on GitHub
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
☆25,658Sep 2, 2024Updated last year
tloen / alpaca-lora
View on GitHub
Instruct-tune LLaMA on consumer hardware
☆18,912Jul 29, 2024Updated last year
thu-ml / unidiffuser
View on GitHub
Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"
☆1,486May 31, 2023Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
UX-Decoder / Segment-Everything-Everywhere-All-At-Once
View on GitHub
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
☆4,795Aug 19, 2024Updated last year
LLaVA-VL / LLaVA-Plus-Codebase
View on GitHub
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
☆769Feb 1, 2024Updated 2 years ago
lm-sys / FastChat
View on GitHub
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
☆39,497May 1, 2026Updated 2 months ago
baaivision / Painter
View on GitHub
Painter & SegGPT Series: Vision Foundation Models from BAAI
☆2,593Dec 6, 2024Updated last year
facebookresearch / segment-anything
View on GitHub
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoi…
☆54,581Sep 18, 2024Updated last year
openlm-research / open_llama
View on GitHub
OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
☆7,531Jul 16, 2023Updated 3 years ago
facebookresearch / MetaCLIP
View on GitHub
NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024
☆1,847Nov 27, 2025Updated 7 months ago