Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"
☆15Apr 30, 2025Updated 11 months ago
Alternatives and similar repositories for Gather-and-Aggregate
Users that are interested in Gather-and-Aggregate are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆123Sep 13, 2024Updated last year
- ☆15Mar 2, 2025Updated last year
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆42Dec 29, 2025Updated 3 months ago
- ☆22Sep 16, 2025Updated 7 months ago
- Source code for the paper "Positional Attention: Expressivity and Learnability of Algorithmic Computation"☆14May 26, 2025Updated 10 months ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- H-Net Dynamic Hierarchical Architecture☆81Sep 11, 2025Updated 7 months ago
- LLM as World Models using Bayesian inference☆17May 27, 2025Updated 10 months ago
- Voice agent using LiveKit (orchestration), Cartesia (TTS), OpenAI (LLM), and Deepgram (STT)☆20Oct 28, 2025Updated 5 months ago
- A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks☆36Oct 31, 2024Updated last year
- Reasoning-based Evaluation and Ranking of Translations.☆20Jul 18, 2025Updated 9 months ago
- The GraphBench package.☆30Mar 3, 2026Updated last month
- [NeurIPS 2024] Official implementation of the paper "MambaLRP: Explaining Selective State Space Sequence Models" 🐍☆46Nov 6, 2024Updated last year
- manipulating cointegrated pairs to achieve a market-neutral strategy that outperforms indices☆10Jan 12, 2021Updated 5 years ago
- ☆36Feb 26, 2024Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Building LLMs from scratch following the book from S. Raschka☆34Mar 27, 2025Updated last year
- Minimal Transformer base in JAX. A single backbone for language modelling, diffusion, classification, etc...☆15May 28, 2025Updated 10 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆92Oct 30, 2024Updated last year
- Reinforcement Learning based on Stock Trading with multiple backends.☆11Mar 2, 2024Updated 2 years ago
- coloring terminal text with intensities (used for plotting probability, entropy with tokens)☆12Oct 11, 2024Updated last year
- A few models converted from caffe to CoreMLs format.☆15Jun 6, 2017Updated 8 years ago
- A Fast, Simplified Model for Molecular Generation with Improved Physical Quality☆28Oct 1, 2025Updated 6 months ago
- Scratchpad/Chain-of-Thought Prompts☆12Jun 6, 2022Updated 3 years ago
- Code for NeurIPS 2024 Paper - Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass☆21Aug 22, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators☆122Jun 14, 2025Updated 10 months ago
- Combining SOAP and MUON☆20Feb 11, 2025Updated last year
- ☆35Mar 12, 2025Updated last year
- [NeurIPS 2025] Official Pytorch Implementation of "The Curse of Depth in Large Language Models" by Wenfang Sun, Xinyuan Song, Pengxiang L…☆70Mar 3, 2026Updated last month
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- ☆15Apr 26, 2025Updated 11 months ago
- This repository is the official implementation of "DG-Mamba: Robust and Efficient Dynamic Graph Structure Learning with Selective State S…☆22Apr 17, 2025Updated last year
- Implementation of Agent Attention in Pytorch☆93Jul 10, 2024Updated last year
- POPGym Library in JAX☆13Apr 15, 2024Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- 📄Small Batch Size Training for Language Models☆81Mar 18, 2026Updated last month
- ☆14Jul 13, 2025Updated 9 months ago
- Integrates Imbue's Cost Aware pareto-Region Bayesian Search (CARBS) with Weights and Biases (WanDB)☆12Mar 17, 2025Updated last year
- Metric Learning (npair loss & angular loss) on mnist and Visualizing by t_SNE☆35Feb 15, 2023Updated 3 years ago
- Make reasoning models scalable☆49May 31, 2025Updated 10 months ago
- Official repo for paper "HiMoE-VLA: Hierarchical Mixture-of-Experts for Generalist Vision-Language-Action Policies"☆28Dec 12, 2025Updated 4 months ago
- ☆35Apr 12, 2024Updated 2 years ago