☆3,266Mar 17, 2026Updated 2 months ago
Alternatives and similar repositories for Attention-Residuals
Users that are interested in Attention-Residuals are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR 2026 Oral] Generative Universal Verifier as Multimodal Meta-Reasoner☆58Nov 14, 2025Updated 6 months ago
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆24Sep 9, 2024Updated last year
- 4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere☆142Apr 16, 2026Updated last month
- My attempt to improve the speed of the newton schulz algorithm, starting from the dion implementation.☆38Apr 30, 2026Updated 2 weeks ago
- ☆1,387Nov 17, 2025Updated 6 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ICLR26] Official implementation of Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling☆182Jan 26, 2026Updated 3 months ago
- Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models☆4,404Jan 14, 2026Updated 4 months ago
- MoBA: Mixture of Block Attention for Long-Context LLMs☆2,117Apr 3, 2025Updated last year
- Simple MoE - Day 17 of 365 Days of Repos☆19Apr 21, 2026Updated 3 weeks ago
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆953Feb 28, 2026Updated 2 months ago
- Official implementation of "From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction"☆68Nov 25, 2025Updated 5 months ago
- [NeurIPS 2024] DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering☆12Oct 22, 2024Updated last year
- ☆17Jul 24, 2025Updated 9 months ago
- [KernelGYM & Dr. Kernel] A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations [ICML…☆172Mar 29, 2026Updated last month
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- THEORY OF SPACE: a benchmark for evaluating whether foundation models can actively explore under partial observability efficiently to bui…☆76Feb 27, 2026Updated 2 months ago
- A sparse attention kernel supporting mix sparse patterns☆513Jan 18, 2026Updated 4 months ago
- Fast and memory-efficient exact attention☆23,836Updated this week
- Accelerating MoE with IO and Tile-aware Optimizations☆684Updated this week
- ☆14Mar 7, 2025Updated last year
- [NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning☆97Apr 20, 2026Updated last month
- Data and Code for COLM 2025 Paper "MSRS: Evaluating Multi-Source Retrieval-Augmented Generation"☆32Aug 29, 2025Updated 8 months ago
- 🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"☆1,000Feb 5, 2026Updated 3 months ago
- mHC-lite: You Don’t Need 20 Sinkhorn-Knopp Iterations☆86Jan 12, 2026Updated 4 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [CVPR 2026] "E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training" official implementation.☆286Feb 24, 2026Updated 2 months ago
- [ICLR 2026] OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling☆467Apr 16, 2026Updated last month
- Ring attention implementation with flash attention☆1,020Sep 10, 2025Updated 8 months ago
- ☆44Jan 30, 2026Updated 3 months ago
- A standalone CXL-enabled system simulator.☆21Apr 19, 2026Updated last month
- Local AI runtime for training & running small LLMs directly on Apple Neural Engine (ANE). No CoreML. No Metal. Offline, on-device fine-tu…☆90Mar 6, 2026Updated 2 months ago
- 🚀 Efficient implementations for emerging model architectures☆5,116Updated this week
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆384Jul 10, 2025Updated 10 months ago
- LLM fighting game mod of YOMI Hustle☆62Apr 19, 2026Updated last month
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- The official codes for Fast Monte Carlo Rendering via Multi-Resolution Sampling☆15Dec 2, 2021Updated 4 years ago
- ☆85Nov 4, 2025Updated 6 months ago
- Official Code for Epona: Autoregressive Diffusion World Model for Autonomous Driving (ICCV 2025)☆343Jul 22, 2025Updated 9 months ago
- Helpful kernel tutorials, examples and SKILLs for tile-based GPU programming☆727Updated this week
- A CLI for managing AI skill packages☆27Jan 18, 2026Updated 4 months ago
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆14Apr 9, 2025Updated last year
- Understanding R1-Zero-Like Training: A Critical Perspective☆1,253Aug 27, 2025Updated 8 months ago