☆3,010Mar 17, 2026Updated 3 weeks ago
Alternatives and similar repositories for Attention-Residuals
Users that are interested in Attention-Residuals are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆24Sep 9, 2024Updated last year
- 4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere☆107Feb 11, 2026Updated last month
- (ICLR'26 + Netflix) Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning☆40Nov 17, 2025Updated 4 months ago
- [CVPR 2025 Highlight] FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation☆26Jun 16, 2025Updated 9 months ago
- My attempt to improve the speed of the newton schulz algorithm, starting from the dion implementation.☆34Dec 5, 2025Updated 4 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- ☆23Nov 29, 2024Updated last year
- [ICLR26] Official implementation of Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling☆162Jan 26, 2026Updated 2 months ago
- ☆1,358Nov 17, 2025Updated 4 months ago
- THEORY OF SPACE: a benchmark for evaluating whether foundation models can actively explore under partial observability efficiently to bui…☆70Feb 27, 2026Updated last month
- MoBA: Mixture of Block Attention for Long-Context LLMs☆2,086Apr 3, 2025Updated last year
- Simple MoE - Day 17 of 365 Days of Repos☆18Jan 17, 2025Updated last year
- [KernelGYM & Dr. Kernel] A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations☆155Mar 29, 2026Updated last week
- [NeurIPS 2024] DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering☆12Oct 22, 2024Updated last year
- ☆17Jul 24, 2025Updated 8 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models☆4,222Jan 14, 2026Updated 2 months ago
- code for the paper titled "Adaptive Cross-Layer Attention for Image Restoration"☆14Nov 6, 2025Updated 5 months ago
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆932Feb 28, 2026Updated last month
- Accelerating MoE with IO and Tile-aware Optimizations☆621Apr 1, 2026Updated last week
- A sparse attention kernel supporting mix sparse patterns☆495Jan 18, 2026Updated 2 months ago
- [ICLR 2026 Oral] Generative Universal Verifier as Multimodal Meta-Reasoner☆56Nov 14, 2025Updated 4 months ago
- ☆43Jan 30, 2026Updated 2 months ago
- [NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning☆91Nov 29, 2025Updated 4 months ago
- ☆14Mar 7, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- 🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"☆981Feb 5, 2026Updated 2 months ago
- [CVPR 2026] "E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training" official implementation.☆275Feb 24, 2026Updated last month
- Fast and memory-efficient exact attention☆23,185Updated this week
- [ICLR 2026] OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling☆454Mar 25, 2026Updated 2 weeks ago
- Ring attention implementation with flash attention☆1,003Sep 10, 2025Updated 6 months ago
- A standalone CXL-enabled system simulator.☆21Jan 10, 2026Updated 3 months ago
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders☆26Feb 21, 2025Updated last year
- Helpful kernel tutorials and examples for tile-based GPU programming☆692Updated this week
- Official Code for Epona: Autoregressive Diffusion World Model for Autonomous Driving (ICCV 2025)☆329Jul 22, 2025Updated 8 months ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- ☆82Nov 4, 2025Updated 5 months ago
- A CLI for managing AI skill packages☆27Jan 18, 2026Updated 2 months ago
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆14Apr 9, 2025Updated last year
- Source code for "Gradient Based Memory Editing for Task-Free Continual Learning", 4th Lifelong ML Workshop@ICML 2020☆17Dec 8, 2022Updated 3 years ago
- Single-stage End-to-End Training for Tokenization and Generation☆81Mar 24, 2026Updated 2 weeks ago
- [CVPR 2026] InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields☆892Apr 3, 2026Updated last week
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinks☆7,208Jul 11, 2024Updated last year