☆42Jun 6, 2025Updated 9 months ago
Alternatives and similar repositories for Aligning-Latent-Spaces-with-Flow-Priors
Users that are interested in Aligning-Latent-Spaces-with-Flow-Priors are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆31Dec 23, 2024Updated last year
- [ICLR 2024] "3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining"☆12Aug 25, 2024Updated last year
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆90Jul 13, 2025Updated 8 months ago
- [ICCV2025] VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation☆33Aug 18, 2025Updated 7 months ago
- ☆27Apr 11, 2025Updated 11 months ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- Code for "Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning [EMNLP 2025 Finding]"☆16Aug 27, 2025Updated 7 months ago
- ☆51Jun 4, 2025Updated 9 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆236Aug 18, 2025Updated 7 months ago
- [ICCV 2025] Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"☆202Jan 7, 2026Updated 2 months ago
- Active Learning Helps Pretrained Models Learn the Intended Task (https://arxiv.org/abs/2204.08491) by Alex Tamkin, Dat Nguyen, Salil Desh…☆11Nov 22, 2022Updated 3 years ago
- Onset-and-Offset-Aware Sound Event Detection☆21Feb 10, 2025Updated last year
- Benchmarking Multi-Image Understanding in Vision and Language Models☆12Jul 29, 2024Updated last year
- Triton kernel fusion for Qwen3-TTS 1.7B inference acceleration — RMSNorm, SwiGLU, M-RoPE, Norm+Residual☆55Mar 22, 2026Updated last week
- Aligntune : A Modular Toolkit for Post Training Alignment of LLMs☆37Mar 23, 2026Updated last week
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆64Jul 22, 2025Updated 8 months ago
- Weird autoencoder experiments☆24Jan 26, 2026Updated 2 months ago
- ☆34May 14, 2025Updated 10 months ago
- This is the official repository for MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation Learning towards Efficient Vision-and-La…☆14Jun 6, 2024Updated last year
- Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization☆25Apr 14, 2025Updated 11 months ago
- Interface Design for Self-Supervised Speech Models, Accepted to Interspeech2024☆16Nov 19, 2024Updated last year
- [Arxiv 2025] ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions☆45Jun 11, 2025Updated 9 months ago
- [NeurIPS 23] Characterizing OOD Error via Optimal Transport☆13Nov 19, 2023Updated 2 years ago
- Official PyTorch implementation of The Linear Attention Resurrection in Vision Transformer☆16Sep 7, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation☆22Apr 23, 2025Updated 11 months ago
- Official implementation of Layout-aware Dreamer for Embodied Referring Expression Grounding [AAAI 23].☆16Apr 13, 2023Updated 2 years ago
- Explore how to get a VQ-VAE models efficiently!☆67Jul 24, 2025Updated 8 months ago
- Implementation of our ICCV 2023 paper DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation☆20Jul 24, 2023Updated 2 years ago
- Pybind11 bindings for Kaldi☆15Feb 1, 2026Updated last month
- Code release for the paper "Progress-Aware Video Frame Captioning" (CVPR 2025)☆21Jul 16, 2025Updated 8 months ago
- C++ neural network library☆13Jul 2, 2016Updated 9 years ago
- code for the paper Imitation Learning from Observation with Automatic Discount Scheduling☆13Mar 27, 2024Updated 2 years ago
- ☆310May 29, 2025Updated 10 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Linear Attention for Efficient Bidirectional Sequence Modeling☆16May 13, 2025Updated 10 months ago
- Official Implementation of Learning Navigational Visual Representations with Semantic Map Supervision (ICCV2023)☆27Jul 30, 2023Updated 2 years ago
- ☆22Oct 19, 2024Updated last year
- Official Implementation for paper "Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm"☆21Mar 18, 2026Updated last week
- [CVPR 2023] Official code release of Cafi-Net: Self-Supervised Learning of Pose-Canonicalized Neural Fields☆15Jul 14, 2023Updated 2 years ago
- Code for A Dual Semantic-Aware Recurrent Global-Adaptive Network For Vision-and-Language Navigation☆17Apr 25, 2024Updated last year
- DACVAE☆204Dec 22, 2025Updated 3 months ago