kyegomez / VisualNexus
An plug in and play pipeline that utilizes segment anything to segment datasets with rich detail for downstream fine-tuning on vision models like CLIP, ViT, Imagebind, and so on!
☆21Updated last year
Alternatives and similar repositories for VisualNexus:
Users that are interested in VisualNexus are comparing it to the libraries listed below
- ☆15Updated last year
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated last year
- An EXA-Scale repository of Multi-Modality AI resources from papers and models, to foundational libraries!☆42Updated last year
- Finetune any model on HF in less than 30 seconds☆58Updated last month
- ☆63Updated 7 months ago
- My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't rel…☆13Updated last year
- A repository of projects and datasets under active development by Alignment Lab AI☆22Updated last year
- ☆20Updated 11 months ago
- The Next Generation Multi-Modality Superintelligence☆71Updated 8 months ago
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆35Updated last year
- [WIP] Transformer to embed Danbooru labelsets☆13Updated last year
- ☆29Updated last year
- ☆22Updated last year
- BH hackathon☆14Updated last year
- MetaCLIP module for use with Autodistill.☆21Updated last year
- A simple package for leveraging Falcon 180B and the HF ecosystem's tools, including training/inference scripts, safetensors, integrations…☆13Updated last year
- Using multiple LLMs for ensemble Forecasting☆16Updated last year
- A clone of OpenAI's Tokenizer page for HuggingFace Models☆45Updated last year
- ☆9Updated 2 weeks ago
- ☆14Updated last year
- The open source implementation of "NeVA: NeMo Vision and Language Assistant"☆18Updated last year
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Updated 11 months ago
- QLoRA for Masked Language Modeling☆22Updated last year
- Tools for content datamining and NLP at scale☆43Updated 10 months ago
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆22Updated 5 months ago
- Visual RAG using less than 300 lines of code.☆27Updated last year
- DALLE-tools provided useful dataset utilities to improve you workflow with WebDatasets.☆15Updated 3 years ago
- OmegaViT (ΩViT) is a cutting-edge vision transformer architecture that combines multi-query attention, rotary embeddings, state space mod…☆14Updated 2 weeks ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆36Updated last year
- ☆13Updated last year