corl-team / rebased
Official implementation of the paper "Linear Transformers with Learnable Kernel Functions are Better In-Context Models"
☆159Updated 2 weeks ago
Alternatives and similar repositories for rebased:
Users that are interested in rebased are comparing it to the libraries listed below
- ☆31Updated 4 months ago
- σ-GPT: A New Approach to Autoregressive Models☆61Updated 5 months ago
- Effective LLM Alignment Toolkit☆107Updated 3 weeks ago
- ☆20Updated 6 months ago
- Normalized Transformer (nGPT)☆146Updated 2 months ago
- ☆78Updated last week
- ☆121Updated this week
- 2D Positional Embeddings for Webpage Structural Understanding 🦙👀☆92Updated 4 months ago
- Focused on fast experimentation and simplicity☆65Updated last month
- The AdEMAMix Optimizer: Better, Faster, Older.☆178Updated 4 months ago
- The code and models for the paper: Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis☆161Updated last month
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆120Updated this week
- PyTorch implementation of models from the Zamba2 series.☆173Updated last week
- ☆36Updated last month
- ☆49Updated 10 months ago
- ☆34Updated this week
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆113Updated last month
- MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating fundament…☆61Updated 3 months ago
- supporting pytorch FSDP for optimizers☆75Updated last month
- Modified Arena-Hard-Auto LLM evaluation toolkit with an emphasis on Russian language☆35Updated last month
- ☆60Updated 9 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆115Updated 5 months ago
- ☆21Updated last year
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆38Updated 8 months ago
- Embed arbitrary modalities (images, audio, documents, etc) into large language models.☆177Updated 10 months ago
- ☆48Updated 2 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆190Updated 6 months ago
- Efficient optimizers☆154Updated this week
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 3 months ago
- Framework for processing and filtering datasets☆27Updated 5 months ago