☆62Jun 16, 2023Updated 2 years ago
Alternatives and similar repositories for FDT
Users that are interested in FDT are comparing it to the libraries listed below
Sorting:
- ☆21Mar 12, 2025Updated 11 months ago
- ☆32Mar 25, 2024Updated last year
- ☆72Jul 28, 2025Updated 7 months ago
- ☆22Apr 27, 2024Updated last year
- [ICCV 2023] ViLLA: Fine-grained vision-language representation learning from real-world data☆46Oct 15, 2023Updated 2 years ago
- ☆11May 17, 2024Updated last year
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding☆50Jan 14, 2025Updated last year
- ☆11Oct 2, 2024Updated last year
- The official implementation of ADDP (ICLR 2024)☆12Mar 27, 2024Updated last year
- (NeurIPS 2019) Combinatorial Inference against Label Noise☆11Jun 13, 2024Updated last year
- [ICLR 2025] Official code repository for "TULIP: Token-length Upgraded CLIP"☆33Jan 26, 2026Updated last month
- ☆120Feb 19, 2024Updated 2 years ago
- ☆58Aug 7, 2023Updated 2 years ago
- Code release for "Improved baselines for vision-language pre-training"☆62May 6, 2024Updated last year
- [TACL] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆16Nov 22, 2024Updated last year
- [CVPR 2024] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model☆19Apr 16, 2024Updated last year
- [CVPR'24] MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding☆17Dec 13, 2024Updated last year
- Official Implementation of "Chrono: A Simple Blueprint for Representing Time in MLLMs"☆92Mar 9, 2025Updated 11 months ago
- NegCLIP.☆39Feb 6, 2023Updated 3 years ago
- ☆65Nov 7, 2024Updated last year
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆19Jun 27, 2024Updated last year
- TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models☆19Jan 2, 2025Updated last year
- Dataset accompanying the paper "Adaptive Methods for Real-World Domain Generalization"☆16Aug 17, 2023Updated 2 years ago
- Chain_of_Thoughts_3D_Visual_Grounding☆19Apr 20, 2024Updated last year
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Oct 17, 2024Updated last year
- ☆17Dec 13, 2023Updated 2 years ago
- Official implementation for ECCV paper "Towards Open Set Video Anomaly Detection"☆16Feb 11, 2023Updated 3 years ago
- [ICCV 2023] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption☆105Sep 18, 2023Updated 2 years ago
- [CVPR 2024] Offical implemention of the paper "DePT: Decoupled Prompt Tuning"☆109Nov 24, 2025Updated 3 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆45Nov 29, 2023Updated 2 years ago
- ☆17Oct 1, 2024Updated last year
- ☆18Mar 1, 2024Updated 2 years ago
- CLAIR: A (surprisingly) simple semantic text metric with large language models.☆22Jan 28, 2024Updated 2 years ago
- WeGeFT: Weight‑Generative Fine‑Tuning for Multi‑Faceted Efficient Adaptation of Large Models☆22Jul 10, 2025Updated 7 months ago
- ☆45Aug 14, 2023Updated 2 years ago
- ☆22Apr 22, 2025Updated 10 months ago
- A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.☆19Dec 27, 2024Updated last year
- Visual and Embodied Concepts evaluation benchmark☆21Oct 10, 2023Updated 2 years ago
- AlignCLIP: Improving Cross-Modal Alignment in CLIP (ICLR 2025)☆60Mar 1, 2025Updated last year