goombalab/Gather-and-Aggregate

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/goombalab/Gather-and-Aggregate)

goombalab / Gather-and-Aggregate

Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"

☆16

Alternatives and similar repositories for Gather-and-Aggregate

Users that are interested in Gather-and-Aggregate are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

goombalab / phi-mamba
View on GitHub
Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…
☆125Sep 13, 2024Updated last year
VITA-Group / SSM-Bottleneck
View on GitHub
[ICLR'25] "Understanding Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing" by Peihao Wang, Ruisi Cai, Yue…
☆18Mar 21, 2025Updated last year
dayal-kalra / low-memory-adam
View on GitHub
☆14Mar 2, 2025Updated last year
goombalab / raven
View on GitHub
☆78May 29, 2026Updated 2 months ago
watcl-lab / positional_attention
View on GitHub
Source code for the paper "Positional Attention: Expressivity and Learnability of Algorithmic Computation"
☆14May 26, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
idoatad / TensorLens
View on GitHub
Official PyTorch implementation for "TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors" [ACL 2026]
☆47Apr 14, 2026Updated 3 months ago
HanGuo97 / log-linear-attention
View on GitHub
☆284Jun 6, 2025Updated last year
main-horse / hnet-old
View on GitHub
H-Net Dynamic Hierarchical Architecture
☆81Sep 11, 2025Updated 10 months ago
ZihaoHuang-notabot / Ultra-Sparse-Memory-Network
View on GitHub
☆48Jul 3, 2026Updated 3 weeks ago
sail-sg / SkyLadder
View on GitHub
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆43Dec 29, 2025Updated 7 months ago
stephenkyang / mean-reversion-pairs-trading
View on GitHub
manipulating cointegrated pairs to achieve a market-neutral strategy that outperforms indices
☆11Jan 12, 2021Updated 5 years ago
violetxi / ExpRL
View on GitHub
☆22Jun 16, 2026Updated last month
Huster-Hq / DADA
View on GitHub
[MICCAI 2025 Early Accept] Targeted False Positive Synthesis via Detector-guided Adversarial Diffusion Attacker for Robust Polyp Detectio…
☆14Dec 5, 2025Updated 7 months ago
OliverSieberling / dynamic-conv1d
View on GitHub
Triton kernels for dynamic causal short convolutions.
☆24Jun 4, 2026Updated last month
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
jacobfa / Attractor
View on GitHub
☆27May 20, 2026Updated 2 months ago
epfml / schedules-and-scaling
View on GitHub
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆93Oct 30, 2024Updated last year
ml-jku / LRAM
View on GitHub
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
☆37Oct 31, 2024Updated last year
yuzhenmao / IceCache
View on GitHub
Implementation for IceCache: Memory-Efficient KV-cache Management for Long-Sequence LLMs (ICLR 2026).
☆20Jun 9, 2026Updated last month
TransluceAI / introspective-interp
View on GitHub
Repository for "Training Language Models To Explain Their Own Computations"
☆23Jul 7, 2026Updated 3 weeks ago
sjelassi / transformers_ssm_copy
View on GitHub
☆40Feb 26, 2024Updated 2 years ago
BAI-LAB / MoE-CL
View on GitHub
[WWW 2026 Oral] MoE-CL:Self-Evolving LLMs via Continual Instruction Tuning
☆21Dec 1, 2025Updated 7 months ago
deep-spin / adasplash
View on GitHub
AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)
☆46May 20, 2026Updated 2 months ago
lyan62 / vlm-info-loss
View on GitHub
☆22Sep 16, 2025Updated 10 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
lucidrains / villa-X
View on GitHub
Implementation of ViLLA-X, Enhancing Latent Action Modeling in Vision-Language-Action Models
☆23Aug 27, 2025Updated 11 months ago
allenai / fluid-benchmarking
View on GitHub
Fluid Language Model Benchmarking
☆29Sep 16, 2025Updated 10 months ago
allenai / signal-and-noise
View on GitHub
Measuring the Signal to Noise Ratio in Language Model Evaluation
☆31Aug 19, 2025Updated 11 months ago
ml-jku / plstm_experiments
View on GitHub
☆16Oct 21, 2025Updated 9 months ago
swairshah / Intensify
View on GitHub
coloring terminal text with intensities (used for plotting probability, entropy with tokens)
☆12Oct 11, 2024Updated last year
lucidrains / multiscreen
View on GitHub
Implementation of Multiscreen proposed by Ken Nakanishi for "Screening is Enough"
☆18May 13, 2026Updated 2 months ago
TianjinYellow / SPAM-Optimizer
View on GitHub
☆36Mar 12, 2025Updated last year
mdering / CoreMLZoo
View on GitHub
A few models converted from caffe to CoreMLs format.
☆15Jun 6, 2017Updated 9 years ago
martin-marek / batch-size
View on GitHub
📄Small Batch Size Training for Language Models
☆82Mar 18, 2026Updated 4 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
apple / ml-pararnn
View on GitHub
☆193Oct 31, 2025Updated 8 months ago
coderaashir / Crypto-Pairs-Trading
View on GitHub
A Statistical Arbitrage Strategy to trade Cryptocurrency Pairs
☆14Nov 6, 2020Updated 5 years ago
goddoe / RLYX
View on GitHub
A hackable, simple, and reseach-friendly GRPO Training Framework with high speed weight synchronization in a multinode environment.
☆38Aug 27, 2025Updated 11 months ago
ckkissane / sae-transfer
View on GitHub
Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"
☆13Jul 18, 2024Updated 2 years ago
flowersteam / WorldLLM
View on GitHub
LLM as World Models using Bayesian inference
☆21May 27, 2025Updated last year
U-C4N / Deepseek-CoT
View on GitHub
Deepseek-CoT
☆10Oct 6, 2024Updated last year
RAIVNLab / SuperposedDecoding
View on GitHub
Code for NeurIPS 2024 Paper - Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
☆21Aug 22, 2024Updated last year