kyegomez / VisualNexus
An plug in and play pipeline that utilizes segment anything to segment datasets with rich detail for downstream fine-tuning on vision models like CLIP, ViT, Imagebind, and so on!
☆21Updated 11 months ago
Alternatives and similar repositories for VisualNexus:
Users that are interested in VisualNexus are comparing it to the libraries listed below
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated 11 months ago
- ☆29Updated last year
- ☆26Updated 11 months ago
- MetaCLIP module for use with Autodistill.☆21Updated last year
- ☆13Updated last year
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆21Updated 2 months ago
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆15Updated 3 months ago
- ☆30Updated last year
- ☆20Updated 8 months ago
- QLoRA for Masked Language Modeling☆21Updated last year
- An EXA-Scale repository of Multi-Modality AI resources from papers and models, to foundational libraries!☆42Updated last year
- My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't rel…☆13Updated last year
- Finetune any model on HF in less than 30 seconds☆58Updated 2 weeks ago
- ☆41Updated 8 months ago
- Use Grounding DINO, Segment Anything, and CLIP to label objects in images.☆26Updated last year
- Visual RAG using less than 300 lines of code.☆25Updated 11 months ago
- ☆62Updated 4 months ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆35Updated last year
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated this week
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆21Updated 6 months ago
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆35Updated last year
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Updated 8 months ago
- ☆59Updated this week
- Public reports detailing responses to sets of prompts by Large Language Models.☆29Updated last month
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆48Updated 2 weeks ago
- 🍳 AyaMCooking is a Voice-to-Voice Mutli-lingual RAG Agent that makes a perfect sous chef for your kitchen, in upto 10 Languages 🤌🧑🍳☆21Updated 3 months ago
- ☆14Updated last year
- Fast approximate inference on a single GPU with sparsity aware offloading☆38Updated last year
- Tools for formatting large language model prompts.☆12Updated last year
- 🤝 Trade any tensors over the network☆30Updated last year