obvious-research / phenaki-cvivitView external linksLinks
Reproduction of the first step in the text-to-video model Phenaki. Code and model weights for the Transformer-based autoencoder for videos called CViViT.
☆29Aug 4, 2023Updated 2 years ago
Alternatives and similar repositories for phenaki-cvivit
Users that are interested in phenaki-cvivit are comparing it to the libraries listed below
Sorting:
- SFT+RL boosts multimodal reasoning☆46Jun 27, 2025Updated 7 months ago
- Implementation of MagViT2 Tokenizer in Pytorch☆661Jan 12, 2025Updated last year
- Adapter package for torch_musa to act exactly like PyTorch CUDA☆19Feb 10, 2026Updated last week
- Unofficial implement of "Pix2seq: A Language Modeling Framework for Object Detection" on mmdetection☆33Apr 18, 2022Updated 3 years ago
- Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis☆86Jul 16, 2024Updated last year
- SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis☆68Jul 24, 2025Updated 6 months ago
- ☆88Jan 4, 2024Updated 2 years ago
- ☆41Sep 21, 2023Updated 2 years ago
- Large Language-and-Vision Assistant for BioMedicine, built towards multimodal GPT-4 level capabilities.☆10Nov 29, 2023Updated 2 years ago
- ☆10May 4, 2023Updated 2 years ago
- ☆22Jan 12, 2026Updated last month
- CBench, Benchmarking System for Question Answering Over Knowledge Graphs Systems.☆12Sep 16, 2022Updated 3 years ago
- Official implementation for “Unsupervised Part Discovery via Dual Representation Alignment” - TPAMI 2024☆11Nov 6, 2024Updated last year
- ☆25Jul 28, 2025Updated 6 months ago
- A PyTorch re-implementation of Weakly Supervised Facial Action Unit Recognition through Adversarial Training☆10Apr 23, 2019Updated 6 years ago
- This is an OCR program designed for travel document. It can now support 23 types of documents with pre-defined template. You can add what…☆10Nov 22, 2022Updated 3 years ago
- ☆10Jan 20, 2021Updated 5 years ago
- A simply Python script to easily grab tags of an image on Danbooru☆10Mar 17, 2023Updated 2 years ago
- ☆11Nov 10, 2023Updated 2 years ago
- A python wrapper for ScriptHookV☆11Jun 26, 2018Updated 7 years ago
- Code Guided Neural Style Transfer for Shape Stylization.☆11Jan 12, 2026Updated last month
- RAST 1.0: Restorable Arbitrary Style Transfer via Multi-restoration☆13Jun 18, 2024Updated last year
- 凌BUG-2021 robomaster国赛视觉组代码(非官方开源)☆11Jan 18, 2022Updated 4 years ago
- [IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer☆106Mar 28, 2024Updated last year
- PyTorch Implementation of MobileDet (https://arxiv.org/abs/2004.14525v3) backbones.☆11Feb 12, 2024Updated 2 years ago
- Repository for the paper CenterPoly: real-time instance segmentation using bounding polygons☆51Aug 24, 2021Updated 4 years ago
- ☆14Aug 9, 2024Updated last year
- awesome unsupervised learning paper list☆12Jan 4, 2018Updated 8 years ago
- ☆13Mar 6, 2023Updated 2 years ago
- Style Transfer by Deep Learning, overview and TensorFlow implementations (UNDER CONSTRUCTION)☆14Jul 25, 2017Updated 8 years ago
- ☆14Dec 14, 2024Updated last year
- auto remove image backgound service by SpringBoot3+ JDK17+ AI☆15Mar 22, 2024Updated last year
- running LayoutLMv2☆11Apr 27, 2022Updated 3 years ago
- [NAACL 2024] Part-based, explainable and editable fine-grained image classifier that allows users to define a species in text☆14Sep 19, 2025Updated 4 months ago
- Multi-temporal Scene dataset for Scene Change Detection.☆15Apr 14, 2021Updated 4 years ago
- Framework to achieve context distillation in LLMs☆15Nov 24, 2023Updated 2 years ago
- ☆12Oct 12, 2020Updated 5 years ago
- Cleaned test data list of DukeMTMC-reID, ICCV2021☆15Aug 26, 2021Updated 4 years ago
- pre-trained vision and language model summary☆12Apr 20, 2021Updated 4 years ago