kyegomez / NeVALinks

The open source implementation of "NeVA: NeMo Vision and Language Assistant"

☆17

Alternatives and similar repositories for NeVA

Users that are interested in NeVA are comparing it to the libraries listed below

Sorting:

SkalskiP / SoM
Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️
☆87Updated last year
hu-po / streamdocs
Documentation, notes, links, etc for streams.
☆83Updated last year
roboflow / cvevals
Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…
☆36Updated last year
autodistill / autodistill-grounded-edgesam
EdgeSAM model for use with Autodistill.
☆27Updated last year
facebookresearch / PUG
This is the repository for the Photorealistic Unreal Graphics (PUG) datasets for representation learning.
☆237Updated last year
kyegomez / Finetuning-Suite
Finetune any model on HF in less than 30 seconds
☆57Updated 2 weeks ago
poloclub / ClickDiffusion
ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing
☆69Updated last year
kyegomez / Kosmos-X
The Next Generation Multi-Modality Superintelligence
☆70Updated 11 months ago
roboflow / inference-client
☆14Updated last year
XiaoduoAILab / XmodelVLM
☆69Updated last year
qnguyen3 / hermes-llava
☆54Updated last year
mbzuai-oryx / PALO
(WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…
☆83Updated this week
KastanDay / video-pretrained-transformer
Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scra…
☆52Updated 2 years ago
kyegomez / LUMIERE
Implementation of the text to video model LUMIERE from the paper: "A Space-Time Diffusion Model for Video Generation" by Google Research
☆51Updated 6 months ago
capjamesg / sam-gpt4v
Use Grounding DINO, Segment Anything, and GPT-4V to label images with segmentation masks for use in training smaller, fine-tuned models.
☆66Updated last year
kyegomez / VisionLLaMA
Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta
☆16Updated 8 months ago
internet-explorer-ssl / internet-explorer
Internet Explorer explores the web in a self-supervised manner to progressively find relevant examples that improve performance on a desi…
☆163Updated 2 years ago
kyegomez / BRAVE-ViT-Swarm
Implementation of the paper: "BRAVE : Broadening the visual encoding of vision-language models"
☆27Updated this week
kyegomez / StarlightVision
A multi-modal AI Model that can generate high quality novel videos with text, images, or video clips.
☆64Updated last year
13331112522 / v-rag
Visual RAG using less than 300 lines of code.
☆28Updated last year
kyegomez / Falcon
A simple package for leveraging Falcon 180B and the HF ecosystem's tools, including training/inference scripts, safetensors, integrations…
☆12Updated last year
ZechengLi19 / CIM
[IJCAI'23] Complete Instances Mining for Weakly Supervised Instance Segmentation
☆37Updated last year
capjamesg / sam-clip
Use Grounding DINO, Segment Anything, and CLIP to label objects in images.
☆31Updated last year
tensoic / Cerule
Cerule - A Tiny Mighty Vision Model
☆66Updated 11 months ago
kyegomez / Sora
Implementation of the premier Text to Video model from OpenAI
☆56Updated 8 months ago
kyegomez / EXA-1
An EXA-Scale repository of Multi-Modality AI resources from papers and models, to foundational libraries!
☆40Updated last year
apple / ml-mofi
☆59Updated last year
autodistill / autodistill-metaclip
MetaCLIP module for use with Autodistill.
☆21Updated last year
apple / ml-tic-clip
Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models".
☆102Updated last year
autodistill / autodistill-grounded-sam-2
Use Segment Anything 2, grounded with Florence-2, to auto-label data for use in training vision models.
☆126Updated last year