autodistill / autodistill-efficient-yolo-world
EfficientSAM + YOLO World base model for use with Autodistill.
☆10Updated last year
Alternatives and similar repositories for autodistill-efficient-yolo-world:
Users that are interested in autodistill-efficient-yolo-world are comparing it to the libraries listed below
- EfficientViT is a new family of vision models for efficient high-resolution vision.☆24Updated last year
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated last year
- Official Pytorch Implementation of Self-emerging Token Labeling☆32Updated last year
- SAM-CLIP module for use with Autodistill.☆14Updated last year
- Official Training and Inference Code of Amodal Expander, Proposed in Tracking Any Object Amodally☆15Updated 8 months ago
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆35Updated last year
- EdgeSAM model for use with Autodistill.☆26Updated 9 months ago
- Python scripts performing Open Vocabulary Object Detection using the YOLO-World model in ONNX.☆49Updated 11 months ago
- Stable Diffusion in TensorRT 8.5+☆14Updated 2 years ago
- arXiv 23 "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs"☆14Updated 4 months ago
- ☆31Updated 2 months ago
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆15Updated 4 months ago
- ☆43Updated 2 months ago
- Python scripts performing optical flow estimation using the NeuFlowV2 model in ONNX.☆41Updated 6 months ago
- Official code repository for paper: "ExPLoRA: Parameter-Efficient Extended Pre-training to Adapt Vision Transformers under Domain Shifts"☆31Updated 5 months ago
- Vision-oriented multimodal AI☆49Updated 9 months ago
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆62Updated 7 months ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆57Updated last month
- Auto Segmentation label generation with SAM (Segment Anything) + Grounding DINO☆19Updated last month
- Official Pytorch implementation for "IFORMER: INTEGRATING CONVNET AND TRANSFORMER FOR MOBILE APPLICATION" [ICLR 2025]☆40Updated this week
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆25Updated last year
- [NeurIPS2022] This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fi…☆83Updated last year
- Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"☆18Updated 2 weeks ago
- An interactive demo based on Segment-Anything for stroke-based painting which enables human-like painting.☆34Updated last year
- [WACV 2025] Official implementation of "Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation" by Xiwen Wei, Guihong L…☆35Updated 4 months ago
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆50Updated 9 months ago
- Codebase for the Recognize Anything Model (RAM)☆75Updated last year
- ☆33Updated last year
- Official PyTorch implementation of "No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding"☆32Updated 10 months ago
- This repository is for the first survey on SAM for videos.☆36Updated 2 weeks ago