anxiangsir/V-SWIFT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/anxiangsir/V-SWIFT)

anxiangsir / V-SWIFT

V-SWIFT: Training a Small VideoMAE Model on a Single Machine in a Day

☆30

Alternatives and similar repositories for V-SWIFT

Users that are interested in V-SWIFT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

anxiangsir / Video_Benchmark_Suite
View on GitHub
Video Benchmark Suite: Rapid Evaluation of Video Foundation Models
☆17Jan 10, 2025Updated last year
deepglint / RealSyn
View on GitHub
[ACM MM2025] The official repository for the RealSyn dataset
☆39Dec 14, 2025Updated 7 months ago
xiaoxing2001 / DeGLA
View on GitHub
[ACM MM25] Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]
☆16Jul 15, 2025Updated last year
GaryGuTC / UniME-v2
View on GitHub
[AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"
☆74Dec 8, 2025Updated 7 months ago
deepglint / MLCD-Seg
View on GitHub
MLCD-Seg is a zero-shot segmentation model from DeepGlint.
☆18Jul 4, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
deepglint / MVT
View on GitHub
Margin-based Vision Transformer
☆70Apr 7, 2026Updated 3 months ago
Multimodal-Representation-Learning-MRL / GA-DMS
View on GitHub
[EMNLP25 Main]The official code of "Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval"
☆25Mar 30, 2026Updated 3 months ago
deepglint / UniME
View on GitHub
[ACM MM 2025] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"
☆105Dec 8, 2025Updated 7 months ago
deepglint / UniDoc-RL
View on GitHub
UniDoc-RL: Unified Document Understanding with Reinforcement Learning
☆16May 21, 2026Updated 2 months ago
wntg / LLaMA-Omni
View on GitHub
llama-omni训练代码复现
☆72Jan 23, 2025Updated last year
nttstar / inswapper-512-live
View on GitHub
☆14Jan 26, 2025Updated last year
EvolvingLMMs-Lab / OneVision-Encoder
View on GitHub
Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
☆385Jun 20, 2026Updated last month
daixiangzi / VAR-CLIP
View on GitHub
Implements VAR+CLIP for text-to-image (T2I) generation
☆147Jan 23, 2025Updated last year
XiaoBuL / OmniCLIP
View on GitHub
[ECAI-2024] OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning
☆16Jan 7, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
deepglint / ALIP
View on GitHub
[ICCV 2023] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
☆106Sep 18, 2023Updated 2 years ago
GaryGuTC / LaPA_model
View on GitHub
[CVPRW 2024] LaPA: Latent Prompt Assist Model For Medical Visual Question Answering
☆27Apr 24, 2025Updated last year
AIS-Bonn / synpick
View on GitHub
SynPick dataset generator
☆13Jul 8, 2021Updated 5 years ago
OpenGVLab / Docopilot
View on GitHub
[CVPR 2025] Docopilot: Improving Multimodal Models for Document-Level Understanding
☆37Jul 22, 2025Updated last year
zuoqing1988 / train-ssd
View on GitHub
train ssd
☆10Apr 30, 2019Updated 7 years ago
dochouyi / SUCC
View on GitHub
☆11May 9, 2024Updated 2 years ago
song-siqi / retarget2humanoid
View on GitHub
Retarget from Human Mesh Descriptions (SMPL, SMPL-X, etc) to Humanoid Poses
☆21Apr 11, 2025Updated last year
sebbyjp / ros2_transformers
View on GitHub
Robotics transformers inference servers in ROS2. RT-1, RT-X, Octo.
☆17Oct 14, 2024Updated last year
minghanz / trafcam_3d
View on GitHub
Repository for the paper "Monocular 3D Vehicle Detection Using Uncalibrated Traffic Camerasthrough Homography"
☆66Jan 7, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
alvin-yang68 / Marching-Cubes
View on GitHub
Implementation of the Marching Cubes algorithm on Python.
☆11Dec 10, 2020Updated 5 years ago
mwoedlinger / ecsic
View on GitHub
Official code of our WACV paper "ECSIC: Epipolar Cross Attention for Stereo Image Compression"
☆15Dec 27, 2023Updated 2 years ago
forlovess / SCNN-pytorch
View on GitHub
Spatial CNN model in PyTorch use Cityscapes-dataset
☆39Dec 17, 2018Updated 7 years ago
EvolvingLMMs-Lab / ParaVT
View on GitHub
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
☆54Jun 2, 2026Updated last month
Mia-YatingYu / STDD
View on GitHub
[AAAI'25]: Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP
☆23Aug 5, 2025Updated 11 months ago
zuoqing1988 / ZQ_FastFaceDetector
View on GitHub
fast face detector
☆18Dec 19, 2018Updated 7 years ago
Niujunbo2002 / NativeRes-LLaVA
View on GitHub
Official code repo for our work "Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models"
☆54Jun 17, 2025Updated last year
daixiangzi / PRCV2019
View on GitHub
☆10Jan 6, 2020Updated 6 years ago
gmberton / LLM-table
View on GitHub
☆17Apr 23, 2026Updated 2 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Angtian / NeuralVS
View on GitHub
The Official Implementation of Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose [NIPS 2021](https://ar…
☆20Dec 7, 2021Updated 4 years ago
zuoqing1988 / ZQ_SmokeSimulation
View on GitHub
☆12Jun 5, 2018Updated 8 years ago
NVlabs / AL-SSL
View on GitHub
☆18Mar 19, 2023Updated 3 years ago
SenseTime-FVG / InteractiveOmni
View on GitHub
☆24Dec 3, 2025Updated 7 months ago
NNDam / deepstream-face-recognition
View on GitHub
Face detection -> alignment -> feature extraction with deepstream
☆12Mar 28, 2023Updated 3 years ago
Fishsoup0 / Autonomous-Driving-Perception
View on GitHub
A Comprehensive Review of 3D Object Detection in Autonomous Driving: Technological Advances and Future Directions
☆36Jul 6, 2026Updated 2 weeks ago
Oneflow-Inc / oneflow_face
View on GitHub
☆12Aug 10, 2022Updated 3 years ago