PKUTAN / SAWTLinks
Official python implementation for ICML 2024: "Learning Solution-Aware Transformers for Efficiently Solving Quadratic Assignment Problem"
☆15Updated last year
Alternatives and similar repositories for SAWT
Users that are interested in SAWT are comparing it to the libraries listed below
Sorting:
- ☆24Updated 2 years ago
- Give us minutes, we give back a faster Mamba. The official implementation of "Faster Vision Mamba is Rebuilt in Minutes via Merged Token …☆40Updated last year
- [NeurIPS'24 spotlight] MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning. [TPAMI'25] MECD+☆43Updated last month
- Official Implementation of Diffusion Step Annealing (DiSA) in Autoregressive Image Generation☆144Updated 6 months ago
- ☆11Updated 4 months ago
- [AAAI 2023(Oral)] Self-supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences☆27Updated last year
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆143Updated last year
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key☆93Updated 2 weeks ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆130Updated 11 months ago
- [ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"☆151Updated last year
- Unofficial implementation of "SODA: Bottleneck Diffusion Models for Representation Learning"☆96Updated last year
- a unified reinforcement learning toolbox for joint RL on language models and diffusion models☆35Updated last month
- [ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"☆246Updated last year
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆412Updated 4 months ago
- [NeurIPS 2025] The official PyTorch implementation of the "Vision Function Layer in MLLM".☆21Updated this week
- Official Implementation of GENIUS: A Generative Framework for Universal Multimodal Search, CVPR 2025☆42Updated 4 months ago
- Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning☆234Updated 6 months ago
- Official implementation of the CVPR'24 paper [Adaptive Slot Attention: Object Discovery with Dynamic Slot Number]☆62Updated 10 months ago
- This is a collection of awesome papers I have read (carefully or roughly) in the fields of computer vision, machine learning, pattern rec…☆14Updated last year
- [ICLR'25] Reconstructive Visual Instruction Tuning☆132Updated 8 months ago
- Official PyTorch implementation Source code for LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation, accepted at …☆113Updated last year
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆136Updated 2 months ago
- NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation Models☆109Updated 4 months ago
- [CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models☆61Updated 3 weeks ago
- Official Pytorch implementation for LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior (ICLR 2025 Oral).☆96Updated 10 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆235Updated 4 months ago
- ☆106Updated last year
- [ECCV 2024] Official repository of ECCV 2024 paper: Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion M…☆15Updated 6 months ago
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆89Updated last year
- [ICLR 2024 Poster] SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos☆20Updated 4 months ago