erogol / BlaGPT
Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration.
☆25Updated 3 weeks ago
Alternatives and similar repositories for BlaGPT:
Users that are interested in BlaGPT are comparing it to the libraries listed below
- Repository for "TESS-2: A Large-Scale, Generalist Diffusion Language Model"☆34Updated 2 months ago
- Implementation of Google's USM speech model in Pytorch☆31Updated 2 weeks ago
- ☆29Updated last week
- Implementation of the proposed MaskBit from Bytedance AI☆75Updated 5 months ago
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"☆95Updated 2 weeks ago
- Official Code Implementation for 'A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models'☆18Updated 9 months ago
- Implementation of a Light Recurrent Unit in Pytorch☆47Updated 6 months ago
- small audio language model for reasoning☆58Updated last week
- Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".☆61Updated 9 months ago
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆14Updated 10 months ago
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆34Updated 10 months ago
- SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022☆113Updated 2 years ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆97Updated 6 months ago
- My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation☆86Updated 6 months ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆118Updated 6 months ago
- Official PyTorch Implementation for Paper "No More Adam: Learning Rate Scaling at Initialization is All You Need"☆51Updated 2 months ago
- Official PyTorch implementation of LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation.☆13Updated 2 weeks ago
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆48Updated last month
- Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".☆24Updated 9 months ago
- Official code implementation for the work Preference Alignment with Flow Matching (NeurIPS 2024)☆49Updated 5 months ago
- ☆59Updated last year
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆152Updated last week
- Interface Design for Self-Supervised Speech Models, Accepted to Interspeech2024☆16Updated 5 months ago
- Explorations into adversarial losses on top of autoregressive loss for language modeling☆35Updated 2 months ago
- LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation☆98Updated last week
- LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models☆25Updated 8 months ago
- ☆35Updated last year
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…☆44Updated last month
- Collection of scripts from mHuBERT-147.☆24Updated 5 months ago
- ☆62Updated 9 months ago