Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch
☆655Dec 27, 2024Updated last year
Alternatives and similar repositories for MEGABYTE-pytorch
Users that are interested in MEGABYTE-pytorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT☆227Mar 25, 2026Updated last month
- Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch☆423Jan 6, 2025Updated last year
- My own attempt at a long context genomics model, leveraging recent advances in long context attention modeling (Flash Attention + other h…☆54Jul 2, 2023Updated 2 years ago
- Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch☆230Sep 6, 2024Updated last year
- Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta☆126May 11, 2026Updated last week
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable)…☆14,531May 8, 2026Updated last week
- Implementation of GateLoop Transformer in Pytorch and Jax☆92Jun 18, 2024Updated last year
- Convolutions for Sequence Modeling☆912Jun 13, 2024Updated last year
- My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation☆90Oct 11, 2024Updated last year
- A concise but complete full-attention transformer with a set of promising experimental features from various papers☆5,859May 13, 2026Updated last week
- [NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333☆1,166Jan 11, 2024Updated 2 years ago
- The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”☆1,001Jan 30, 2024Updated 2 years ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆122Oct 17, 2024Updated last year
- Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"☆1,064Mar 7, 2024Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Fast and memory-efficient exact attention☆23,836Updated this week
- Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch☆1,544Apr 24, 2025Updated last year
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,335Mar 6, 2025Updated last year
- Official implementation of TransNormerLLM: A Faster and Better LLM☆254Jan 23, 2024Updated 2 years ago
- Official Implementation of ACL2023: Don't Parse, Choose Spans! Continuous and Discontinuous Constituency Parsing via Autoregressive Span …☆14Aug 25, 2023Updated 2 years ago
- QLoRA: Efficient Finetuning of Quantized LLMs☆10,908Jun 10, 2024Updated last year
- Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch☆104Oct 10, 2023Updated 2 years ago
- 🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch☆2,186Nov 27, 2024Updated last year
- Beyond Language Models: Byte Models are Digital World Simulators☆334Jun 6, 2024Updated last year
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch☆2,620Jan 12, 2025Updated last year
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆37Aug 14, 2024Updated last year
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆253Jun 6, 2025Updated 11 months ago
- An Open-source Streaming High-fidelity Neural Audio Codec☆504Mar 4, 2025Updated last year
- ImageBind One Embedding Space to Bind Them All☆9,026Nov 21, 2025Updated 5 months ago
- Foundation Architecture for (M)LLMs☆3,131Apr 11, 2024Updated 2 years ago
- The repository for the code of the UltraFastBERT paper☆518Mar 24, 2024Updated 2 years ago
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinks☆7,229Jul 11, 2024Updated last year
- Sequence modeling with Mega.☆303Jan 28, 2023Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Efficient Transformers with Dynamic Token Pooling☆68May 20, 2023Updated 3 years ago
- The RedPajama-Data repository contains code for preparing large datasets for training large language models.☆4,939Dec 7, 2024Updated last year
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI☆1,409Apr 11, 2024Updated 2 years ago
- Mamba SSM architecture☆18,237May 10, 2026Updated last week
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆207Aug 26, 2023Updated 2 years ago
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆563Dec 28, 2024Updated last year
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models☆344Feb 23, 2025Updated last year