GiantAILab / DeepSound-V1Links
Official code for DeepSound-V1
☆12Updated 5 months ago
Alternatives and similar repositories for DeepSound-V1
Users that are interested in DeepSound-V1 are comparing it to the libraries listed below
Sorting:
- DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning…☆25Updated last month
- ☆22Updated 2 months ago
- (NIPS 2025) OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Align…☆107Updated last month
- This repository contains the code for our ICML 2025 paper——LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection🎉☆24Updated 5 months ago
- [ICCV 2025] FonTS: Text Rendering with Typography and Style Controls☆32Updated this week
- [ICML 2025] This is the official PyTorch implementation of "🎵 HarmoniCa: Harmonizing Training and Inference for Better Feature Caching i…☆43Updated 3 months ago
- Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences (ICML 2025)☆25Updated 4 months ago
- [CVPR 2025] Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation☆69Updated last month
- An official implementation of "SIM-CoT: Supervised Implicit Chain-of-Thought"☆95Updated last month
- [ICML2025] Official Code of From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection☆24Updated 4 months ago
- ☆20Updated last month
- This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels☆60Updated last month
- ☆24Updated this week
- [IJCV 2025] Smaller But Better: Unifying Layout Generation with Smaller Large Language Models☆147Updated 2 months ago
- MokA: Multimodal Low-Rank Adaptation for MLLMs☆39Updated 4 months ago
- ☆58Updated 5 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆31Updated last month
- [CVPRW 2025] UniToken is an auto-regressive generation model that combines discrete and continuous representations to process visual inpu…☆95Updated 6 months ago
- [CVPR 2025] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification☆39Updated 7 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆227Updated 2 months ago
- [ICML 2025 Oral] An official implementation of VideoRoPE & VideoRoPE++☆200Updated 3 months ago
- ☆35Updated 2 months ago
- UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation☆107Updated last week
- [CVPR] MergeVQ: A Unified Framework for Visual Generation and Representation with Token Merging and Quantization☆44Updated 3 months ago
- Doodling our way to AGI ✏️ 🖼️ 🧠☆109Updated 5 months ago
- 🚀 Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models☆36Updated last week
- [IEEE TPAMI 2025] Privacy-Preserving Biometric Verification With Handwritten Random Digit String☆64Updated 2 months ago
- 🚀 Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models☆33Updated 3 months ago
- 📚 Collection of token-level model compression resources.☆173Updated last month
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆57Updated 4 months ago