jwzhanggy / tinyBIG
tinybig for deep function learning
☆56Updated last week
Alternatives and similar repositories for tinyBIG:
Users that are interested in tinyBIG are comparing it to the libraries listed below
- ☆134Updated 3 months ago
- Awesome list of papers that extend Mamba to various applications.☆128Updated 2 months ago
- ☆182Updated last year
- Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficien…☆62Updated last month
- State Space Models☆64Updated 7 months ago
- ☆41Updated 2 months ago
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆51Updated last month
- ☆152Updated this week
- The official repository for the Scientific Paper Idea Proposer (SciPIP)☆45Updated last week
- A repository for DenseSSMs☆87Updated 8 months ago
- ☆186Updated this week
- 🕹️The toy examples of Kolmogorov-Arnold Network (Get Started Quickly)☆74Updated 7 months ago
- My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing o…☆42Updated this week
- ☆158Updated last week
- ☆121Updated 7 months ago
- PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"☆149Updated last month
- The official implementation for ICLR23 spotlight paper "DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion"☆303Updated 4 months ago
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆174Updated last month
- ☆50Updated 2 months ago
- OpenReivew Submission Visualization (ICLR 2024/2025)☆145Updated 2 months ago
- Multi-Agent System for Science of Science☆64Updated last week
- The official GitHub page for the survey paper "A Survey on Mixture of Experts".☆162Updated last week
- Benchmark for efficiency in memory and time of different KAN implementations.☆113Updated 3 months ago
- Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States☆375Updated 4 months ago
- A generalized framework for subspace tuning methods in parameter efficient fine-tuning.☆113Updated 3 months ago
- C++ and Cuda ops for fused FourierKAN☆73Updated 7 months ago
- MNIST example using Kolmogorov-Arnold Networks☆27Updated 7 months ago
- AI Alignment: A Comprehensive Survey☆131Updated last year
- PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model …☆48Updated last month
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models☆189Updated 7 months ago