eminorhan / humanlike-vitsLinks
ViT models pretrained with up to ~5k hours of human-like video data
☆14Updated 2 years ago
Alternatives and similar repositories for humanlike-vits
Users that are interested in humanlike-vits are comparing it to the libraries listed below
Sorting:
- Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.☆101Updated 9 months ago
- repo for paper titled: Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment (AAAI'24 Oral)☆25Updated last year
- Code for the paper "If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection"☆27Updated 2 years ago
- Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"☆30Updated last year
- [ECCV’24] Official repository for "BEAF: Observing Before-AFter Changes to Evaluate Hallucination in Vision-language Models"☆21Updated 9 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆45Updated 2 years ago
- https://arxiv.org/abs/2209.15162☆53Updated 2 years ago
- A curated list of papers and resources for text-to-image evaluation.☆30Updated 2 years ago
- Original code base for On Pretraining Data Diversity for Self-Supervised Learning☆14Updated last year
- [WACV2025 Oral] DeepMIM: Deep Supervision for Masked Image Modeling☆55Updated 8 months ago
- Code release for "Understanding Bias in Large-Scale Visual Datasets"☆22Updated last year
- ☆38Updated last year
- Code and data for EMNLP 2023 paper "Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?"☆15Updated last year
- Test-Time Training on Video Streams☆66Updated 2 years ago
- ☆25Updated 2 years ago
- Code and models for the paper "The effectiveness of MAE pre-pretraining for billion-scale pretraining" https://arxiv.org/abs/2303.13496☆92Updated 9 months ago
- The official repository of paper "ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection" (N…☆50Updated 2 years ago
- DALL-E for Detection: Language-driven Compositional Image Synthesis for Object Detection☆21Updated 2 years ago
- official repo for the paper "EXIF as Language: Learning Cross-Modal Associations Between Images and Camera Metadata"☆51Updated 2 years ago
- [CVPR 2024 Highlight] ImageNet-D☆46Updated last year
- ☆16Updated 2 years ago
- This repo contains the official PyTorch implementation of vLMIG: Improving Visual Commonsense in Language Models via Multiple Image Gener…☆17Updated last year
- Test-Time Distribution Normalization For Contrastively Learned Vision-language Models☆27Updated last year
- Training code for CLIP-FlanT5☆30Updated last year
- [CVPR 2023] Zero-shot Generative Model Adaptation via Image-specific Prompt Learning☆83Updated 2 years ago
- [ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.☆27Updated 2 years ago
- Create generated datasets and train robust classifiers☆36Updated 2 years ago
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆26Updated 11 months ago
- [ICCV 2023] Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models☆85Updated 2 years ago
- Official implementation of the paper The Hidden Language of Diffusion Models☆77Updated last year