yuxiaochen1103 / FDTView external linksLinks
☆62Jun 16, 2023Updated 2 years ago
Alternatives and similar repositories for FDT
Users that are interested in FDT are comparing it to the libraries listed below
Sorting:
- ☆22Mar 12, 2025Updated 11 months ago
- ☆32Mar 25, 2024Updated last year
- ☆72Jul 28, 2025Updated 6 months ago
- ☆22Apr 27, 2024Updated last year
- [ICCV 2023] ViLLA: Fine-grained vision-language representation learning from real-world data☆46Oct 15, 2023Updated 2 years ago
- ☆11May 17, 2024Updated last year
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding☆48Jan 14, 2025Updated last year
- ☆11Oct 2, 2024Updated last year
- The official implementation of ADDP (ICLR 2024)☆12Mar 27, 2024Updated last year
- (NeurIPS 2019) Combinatorial Inference against Label Noise☆11Jun 13, 2024Updated last year
- Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models☆27Nov 29, 2023Updated 2 years ago
- [ICLR 2025] Official code repository for "TULIP: Token-length Upgraded CLIP"☆33Jan 26, 2026Updated 2 weeks ago
- ☆120Feb 19, 2024Updated last year
- Code release for "Improved baselines for vision-language pre-training"☆62May 6, 2024Updated last year
- ☆58Aug 7, 2023Updated 2 years ago
- [CVPR 2024] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model☆18Apr 16, 2024Updated last year
- [CVPR'24] MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding☆18Dec 13, 2024Updated last year
- Animals3D: Learning Articulated Shape with Keypoint Pseudo-labels from Web Images (CVPR 2023)☆14May 20, 2024Updated last year
- [TACL] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆16Nov 22, 2024Updated last year
- Official Implementation of "Chrono: A Simple Blueprint for Representing Time in MLLMs"☆92Mar 9, 2025Updated 11 months ago
- NegCLIP.☆38Feb 6, 2023Updated 3 years ago
- Repository for hosting the code for the CVPR 2020 paper Differentiable Adaptive Computation Time for Visual Reasoning.☆14Aug 26, 2020Updated 5 years ago
- ☆17Dec 13, 2023Updated 2 years ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Oct 17, 2024Updated last year
- Official implementation for ECCV paper "Towards Open Set Video Anomaly Detection"☆16Feb 11, 2023Updated 3 years ago
- CLAIR: A (surprisingly) simple semantic text metric with large language models.☆21Jan 28, 2024Updated 2 years ago
- Chain_of_Thoughts_3D_Visual_Grounding☆19Apr 20, 2024Updated last year
- TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models☆19Jan 2, 2025Updated last year
- [ICCV 2023] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption☆105Sep 18, 2023Updated 2 years ago
- [CVPR 2024] Offical implemention of the paper "DePT: Decoupled Prompt Tuning"☆109Nov 24, 2025Updated 2 months ago
- ☆45May 20, 2025Updated 8 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆45Nov 29, 2023Updated 2 years ago
- ☆17Oct 1, 2024Updated last year
- ☆18Mar 1, 2024Updated last year
- [WACV 2024] Official Implementation of TIAM - A Metric for Evaluating Alignment in Text-to-Image Generation☆19Feb 3, 2025Updated last year
- WeGeFT: Weight‑Generative Fine‑Tuning for Multi‑Faceted Efficient Adaptation of Large Models☆22Jul 10, 2025Updated 7 months ago
- ☆45Aug 14, 2023Updated 2 years ago
- A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.☆19Dec 27, 2024Updated last year
- ☆22Apr 22, 2025Updated 9 months ago