cqels / visionLinks
☆19Updated 3 months ago
Alternatives and similar repositories for vision
Users that are interested in vision are comparing it to the libraries listed below
Sorting:
- SSL Video Representation Learning project☆14Updated 7 months ago
- Official repository for the General Robust Image Task (GRIT) Benchmark☆54Updated 2 years ago
- [NeurIPS 2022] The official implementation of "Learning to Discover and Detect Objects".☆111Updated 2 years ago
- Code for paper Rethinking the Data Annotation Process for Multi-view 3D Pose Estimation with Active Learning and Self-Training☆22Updated 2 years ago
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆37Updated 2 years ago
- ☆25Updated 2 years ago
- ☆26Updated 3 years ago
- ☆83Updated 2 years ago
- [CVPR'23 Highlight] Heterogeneous Continual Learning.☆15Updated 2 years ago
- Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch☆104Updated 2 years ago
- NeuSyRE: A Neuro-Symbolic Visual Understanding and Reasoning Framework based on Scene Graph Enrichment☆22Updated last year
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆40Updated last year
- Tracking through Containers and Occluders in the Wild (CVPR 2023) - Official Implementation☆41Updated last year
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆26Updated last week
- Repository for the paper Do SSL Models Have Déjà Vu? A Case of Unintended Memorization in Self-supervised Learning☆36Updated 2 years ago
- Code for NeurIPS2023 Paper "Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning"☆26Updated 2 years ago
- Implementation of a Hierarchical Mamba as described in the paper: "Hierarchical State Space Models for Continuous Sequence-to-Sequence Mo…☆15Updated last year
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆102Updated last year
- PyTorch code for Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles (DANCE)☆23Updated 3 years ago
- [TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"☆20Updated 2 years ago
- [NeurIPS 2023] HASSOD: Hierarchical Adaptive Self-Supervised Object Detection☆58Updated 2 years ago
- ☆73Updated 6 months ago
- In this codebase we establish a benchmark for egocentric user adaptation based on Ego4d.First, we start from a population model which ha…☆15Updated last year
- Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scra…☆54Updated 2 years ago
- Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"☆19Updated 3 years ago
- Description and applications of OpenAI's paper about DALL-E (2021) and implementation of other (CLIP-guided) zero-shot text-to-image gene…☆33Updated 3 years ago
- EdgeSAM model for use with Autodistill.☆29Updated last year
- Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta☆16Updated last year
- ☆43Updated last year
- Repository for ACL2020 paper "Refer360° A Referring Expression Recognition Dataset in 360°Images"☆13Updated 4 years ago