kyegomez / VisualNexus
An plug in and play pipeline that utilizes segment anything to segment datasets with rich detail for downstream fine-tuning on vision models like CLIP, ViT, Imagebind, and so on!
☆21Updated last year
Alternatives and similar repositories for VisualNexus:
Users that are interested in VisualNexus are comparing it to the libraries listed below
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated last year
- The open source implementation of "NeVA: NeMo Vision and Language Assistant"☆18Updated last year
- MetaCLIP module for use with Autodistill.☆21Updated last year
- Visual RAG using less than 300 lines of code.☆26Updated last year
- An EXA-Scale repository of Multi-Modality AI resources from papers and models, to foundational libraries!☆42Updated last year
- Tools for content datamining and NLP at scale☆42Updated 9 months ago
- My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't rel…☆13Updated last year
- ☆29Updated last year
- ☆46Updated 8 months ago
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Updated 9 months ago
- Finetune any model on HF in less than 30 seconds☆58Updated last month
- ☆26Updated last year
- ☆20Updated 9 months ago
- ☆13Updated last year
- ☆63Updated 5 months ago
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆35Updated last year
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆15Updated 4 months ago
- ☆30Updated last year
- Using multiple LLMs for ensemble Forecasting☆16Updated last year
- ☆13Updated last year
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated this week
- BH hackathon☆14Updated 11 months ago
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆41Updated 11 months ago
- ☆16Updated last year
- OmegaViT (ΩViT) is a cutting-edge vision transformer architecture that combines multi-query attention, rotary embeddings, state space mod…☆14Updated this week
- Tools for formatting large language model prompts.☆12Updated last year
- Github repo for Peifeng's internship project☆14Updated last year
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆21Updated 7 months ago
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated last year