autodistill / autodistill-efficient-yolo-world
EfficientSAM + YOLO World base model for use with Autodistill.
☆10Updated last year
Alternatives and similar repositories for autodistill-efficient-yolo-world:
Users that are interested in autodistill-efficient-yolo-world are comparing it to the libraries listed below
- Official Pytorch Implementation of Self-emerging Token Labeling☆33Updated last year
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated last year
- EfficientViT is a new family of vision models for efficient high-resolution vision.☆24Updated last year
- SAM-CLIP module for use with Autodistill.☆15Updated last year
- Stable Diffusion in TensorRT 8.5+☆14Updated 2 years ago
- ComfyUI YOLO-World Integration☆42Updated 10 months ago
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆35Updated last year
- ☆14Updated 4 months ago
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆16Updated 5 months ago
- ☆32Updated 3 months ago
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆14Updated 5 months ago
- Official Training and Inference Code of Amodal Expander, Proposed in Tracking Any Object Amodally☆17Updated 9 months ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆58Updated 2 months ago
- Vision-oriented multimodal AI☆49Updated 10 months ago
- ☆34Updated last year
- ☆13Updated 2 years ago
- Official Pytorch implementation for "IFORMER: INTEGRATING CONVNET AND TRANSFORMER FOR MOBILE APPLICATION" [ICLR 2025]☆42Updated last month
- Our 2nd-gen LMM☆33Updated 11 months ago
- EdgeSAM model for use with Autodistill.☆26Updated 10 months ago
- Codebase for the Recognize Anything Model (RAM)☆78Updated last year
- [NeurIPS2022] This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fi…☆84Updated last year
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆21Updated 9 months ago
- The open source implementation of "AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model"☆22Updated 3 months ago
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆17Updated 2 weeks ago
- The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"☆28Updated last month
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated 9 months ago
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆41Updated last year
- A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the p…☆14Updated last year
- Official PyTorch implementation of "No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding"☆32Updated 11 months ago
- 💡💡💡awesome compute vision app in gradio☆52Updated 11 months ago