LHBuilder / SA-Segment-AnythingLinks

Vision-oriented multimodal AI

☆49

Alternatives and similar repositories for SA-Segment-Anything

Users that are interested in SA-Segment-Anything are comparing it to the libraries listed below

Sorting:

TencentARC / ViSFT
☆35Updated last year
NVlabs / STL
Official Pytorch Implementation of Self-emerging Token Labeling
☆35Updated last year
Expedit-LargeScale-Vision-Transformer / Expedit-SAM
[NeurIPS2022] This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fi…
☆85Updated 2 years ago
roboflow / cvevals
Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…
☆37Updated 2 years ago
mhyeonsoo / SAM_gDINO_AutoLabeling
Auto Segmentation label generation with SAM (Segment Anything) + Grounding DINO
☆22Updated 8 months ago
harrytea / TGDoc
"Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023
☆15Updated 11 months ago
kirill-vish / Beyond-INet
Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"
☆101Updated last year
bytedance / DQ-Det
Codes for ICML 2023 Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation
☆37Updated 2 years ago
umd-huang-lab / perceptionCLIP
Code for our ICLR 2024 paper "PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts"
☆79Updated last year
DCDmllm / HyperLLaVA
Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
☆27Updated last year
jihaonew / MM-Instruct
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
☆35Updated last year
autodistill / autodistill-grounded-edgesam
EdgeSAM model for use with Autodistill.
☆29Updated last year
top-yun / SPARK
A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.
☆19Updated 10 months ago
kyegomez / MC-ViT
Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"
☆22Updated 3 weeks ago
artemisp / LAVIS-XInstructBLIP
LAVIS - A One-stop Library for Language-Vision Intelligence
☆48Updated last year
facebookresearch / PartDistillation
Code release for the CVPR'23 paper titled "PartDistillation Learning part from Instance Segmentation"
☆57Updated last year
Yuqifan1117 / Labal-Anything-Pipeline
Baby-DALL3: Annotation anything in visual tasks and Generate anything just all in one-pipeline with GPT-4 (a small baby of DALL·E 3).
☆83Updated 2 years ago
V3Det / Detectron2-V3Det
Detectron2 Toolbox and Benchmark for V3Det
☆18Updated last year
UX-Decoder / FIND
[NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"
☆125Updated last year
mightyzau / InfMLLM
☆19Updated last year
callsys / TextVR
[PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension
☆28Updated last year
opendatalab / CLIP-Parrot-Bias
ECCV2024_Parrot Captions Teach CLIP to Spot Text
☆65Updated last year
OPPOMKLab / recognize-anything
Codebase for the Recognize Anything Model (RAM)
☆85Updated last year
JiauZhang / tracking-arxiv
微信公众号：机器感知 | Tracking the Latest Arxiv Papers
☆38Updated 4 months ago
showlab / sparseformer
(ICLR 2024, CVPR 2024) SparseFormer
☆75Updated 11 months ago
OliverRensu / D-iGPT
[ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…
☆98Updated last year
FudanNLPLAB / MouSi
☆74Updated last year
jeykigung / HiCLIP
☆30Updated 2 years ago
kaiyuyue / nxtp
PyTorch Implementation of Object Recognition as Next Token Prediction [CVPR'24 Highlight]
☆180Updated 5 months ago
kyegomez / AnyMAL
The open source implementation of "AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model"
☆21Updated 9 months ago