[NeurIPS 2025 Spotlight] FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
☆69Dec 21, 2025Updated 2 months ago
Alternatives and similar repositories for FUDOKI
Users that are interested in FUDOKI are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2025] Encoder-Decoder Diffusion Language Models for Efficient Training and Inference☆36Oct 29, 2025Updated 4 months ago
- [NeurIPS 2025] ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models☆31Jul 1, 2025Updated 8 months ago
- CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms☆25Dec 21, 2025Updated 2 months ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆94Sep 14, 2024Updated last year
- This repository includes the official implementation of our paper "Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generat…☆245Oct 12, 2025Updated 4 months ago
- A simple, easy-to-understand library for diffusion models using Flax and Jax. Includes detailed notebooks on DDPM, DDIM, and EDM with sim…☆41May 6, 2025Updated 10 months ago
- Hyperparameter analysis for Image Captioning using LSTMs and Transformers☆26Oct 3, 2023Updated 2 years ago
- [TCSVT] state-of-the-art open vocabulary detector on COCO/LVIS/V3Det☆32Jun 3, 2025Updated 9 months ago
- [ICLR 2026] Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusio…☆100Feb 4, 2026Updated last month
- [ICLR2026] AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model☆53Oct 12, 2025Updated 4 months ago
- arxiv daily for speech translation, legal. Ref: Vincentqyw/cv-arxiv-daily☆15Jan 6, 2025Updated last year
- A python algorithm to change the pitch of the voice in real time☆13Dec 13, 2020Updated 5 years ago
- [Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Models☆43Mar 11, 2025Updated 11 months ago
- MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols☆17Nov 19, 2025Updated 3 months ago
- Visualizing 230 years of US Census data☆12Feb 23, 2020Updated 6 years ago
- WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models☆27Feb 13, 2026Updated 3 weeks ago
- ORION: Option-Regularized Deep Reinforcement Learning for Cooperative Multi-Agent Online Navigation☆21Feb 17, 2026Updated 2 weeks ago
- 기획자와 마케터를 위한 이벤트 댓글 분석 - feat. 인프런 새해 다짐 이벤트☆11Apr 22, 2020Updated 5 years ago
- Learning an Interpretable End-to-End Network for Real-Time Acoustic Beamforming☆15Aug 20, 2024Updated last year
- >>PhysWikiQuiz<< - a Physics Question Generation and Interrogation System☆11Feb 25, 2023Updated 3 years ago
- [데이콘] 가스공급량 수요예측 모델 개발 대회 3등☆11Apr 12, 2022Updated 3 years ago
- ☆10Mar 10, 2023Updated 2 years ago
- ☆12Oct 7, 2024Updated last year
- The code of 'The devil is in the labels: Semantic segmentation from sentences'.☆13Nov 13, 2022Updated 3 years ago
- Test-time Fourier Style Calibration for Domain Generalization - IJCAI 2022☆16Jul 21, 2022Updated 3 years ago
- ☆11Dec 23, 2022Updated 3 years ago
- ☆11Nov 7, 2024Updated last year
- A curated list of awesome exploration policy papers.☆13Jan 3, 2026Updated 2 months ago
- [2022.05.16 ~ 2022.06.10] 🌤️미세먼지 없는 맑은 사진📷 - 부스트캠프 AI Tech 3기 최종 프로젝트☆14Jun 11, 2022Updated 3 years ago
- 코로나-19 에 대한 확진/완치/사망 에 대한 국내, 해외 정보를 수집합니다. Data scrapes Covid-19 Confirmed/Cured/Deceases Cases.☆10Jun 6, 2021Updated 4 years ago
- ☆12Feb 9, 2022Updated 4 years ago
- Unofficial implementation of Chain of Hindsight (https://arxiv.org/abs/2302.02676) using pytorch and huggingface Trainers.☆11Apr 5, 2023Updated 2 years ago
- Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge☆21Jul 25, 2022Updated 3 years ago
- ☆10Apr 13, 2021Updated 4 years ago
- This is the official repository of Emotion-Driven Melody Harmonization via Melodic Variation and Functional Representation.☆12Sep 25, 2024Updated last year
- Flutter Application starter using get ecosystem☆10Jan 19, 2021Updated 5 years ago
- A holistic framework for advancing LLMs as data science agents☆33Feb 3, 2026Updated last month
- semantic tokenizer for speech and music☆21Jul 6, 2025Updated 8 months ago
- Open, royalty free, lyrics2song / song generation data collection / cleaning pipeline.☆17May 9, 2025Updated 9 months ago