Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"
☆15Apr 30, 2025Updated 10 months ago
Alternatives and similar repositories for Gather-and-Aggregate
Users that are interested in Gather-and-Aggregate are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆122Sep 13, 2024Updated last year
- ☆15Mar 2, 2025Updated last year
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆42Dec 29, 2025Updated 3 months ago
- ☆22Sep 16, 2025Updated 6 months ago
- Source code for the paper "Positional Attention: Expressivity and Learnability of Algorithmic Computation"☆14May 26, 2025Updated 10 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- H-Net Dynamic Hierarchical Architecture☆81Sep 11, 2025Updated 6 months ago
- LLM as World Models using Bayesian inference☆17May 27, 2025Updated 10 months ago
- Voice agent using LiveKit (orchestration), Cartesia (TTS), OpenAI (LLM), and Deepgram (STT)☆20Oct 28, 2025Updated 5 months ago
- A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks☆36Oct 31, 2024Updated last year
- Reasoning-based Evaluation and Ranking of Translations.☆20Jul 18, 2025Updated 8 months ago
- The GraphBench package.☆28Mar 3, 2026Updated 3 weeks ago
- [NeurIPS 2024] Official implementation of the paper "MambaLRP: Explaining Selective State Space Sequence Models" 🐍☆45Nov 6, 2024Updated last year
- manipulating cointegrated pairs to achieve a market-neutral strategy that outperforms indices☆12Jan 12, 2021Updated 5 years ago
- ☆36Feb 26, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Building LLMs from scratch following the book from S. Raschka☆34Mar 27, 2025Updated last year
- Minimal Transformer base in JAX. A single backbone for language modelling, diffusion, classification, etc...☆14May 28, 2025Updated 10 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆92Oct 30, 2024Updated last year
- Reinforcement Learning based on Stock Trading with multiple backends.☆11Mar 2, 2024Updated 2 years ago
- coloring terminal text with intensities (used for plotting probability, entropy with tokens)☆12Oct 11, 2024Updated last year
- A few models converted from caffe to CoreMLs format.☆15Jun 6, 2017Updated 8 years ago
- A Fast, Simplified Model for Molecular Generation with Improved Physical Quality☆27Oct 1, 2025Updated 5 months ago
- This repository is the official implementation of "DG-Mamba: Robust and Efficient Dynamic Graph Structure Learning with Selective State S…☆22Apr 17, 2025Updated 11 months ago
- Scratchpad/Chain-of-Thought Prompts☆12Jun 6, 2022Updated 3 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Code for NeurIPS 2024 Paper - Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass☆21Aug 22, 2024Updated last year
- TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators☆120Jun 14, 2025Updated 9 months ago
- Combining SOAP and MUON☆19Feb 11, 2025Updated last year
- ☆35Mar 12, 2025Updated last year
- [NeurIPS 2025] Official Pytorch Implementation of "The Curse of Depth in Large Language Models" by Wenfang Sun, Xinyuan Song, Pengxiang L…☆70Mar 3, 2026Updated 3 weeks ago
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- ☆15Apr 26, 2025Updated 11 months ago
- POPGym Library in JAX☆12Apr 15, 2024Updated last year
- 📄Small Batch Size Training for Language Models☆81Mar 18, 2026Updated last week
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆15Jul 13, 2025Updated 8 months ago
- Integrates Imbue's Cost Aware pareto-Region Bayesian Search (CARBS) with Weights and Biases (WanDB)☆12Mar 17, 2025Updated last year
- Metric Learning (npair loss & angular loss) on mnist and Visualizing by t_SNE☆35Feb 15, 2023Updated 3 years ago
- Make reasoning models scalable☆48May 31, 2025Updated 9 months ago
- Official repo for paper "HiMoE-VLA: Hierarchical Mixture-of-Experts for Generalist Vision-Language-Action Policies"☆28Dec 12, 2025Updated 3 months ago
- A project for implementing ML and NLP papers☆13May 22, 2020Updated 5 years ago
- ☆35Apr 12, 2024Updated last year