aykutcayir34 / DifferentialTransformer
☆8Updated 5 months ago
Alternatives and similar repositories for DifferentialTransformer:
Users that are interested in DifferentialTransformer are comparing it to the libraries listed below
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆103Updated 4 months ago
- Implementation of a Light Recurrent Unit in Pytorch☆47Updated 5 months ago
- Pytorch (Lightning) implementation of the Mamba model☆25Updated 11 months ago
- Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"☆57Updated last year
- Contains materials for my talk "You don't know TensorFlow".☆9Updated 2 years ago
- Implementation of Agent Attention in Pytorch☆90Updated 8 months ago
- Implementation of CaiT models in TensorFlow and ImageNet-1k checkpoints. Includes code for inference and fine-tuning.☆12Updated last year
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆80Updated last month
- Refactored version of https://github.com/ming024/FastSpeech2☆13Updated 3 years ago
- an implementation of paper"Retentive Network: A Successor to Transformer for Large Language Models" https://arxiv.org/pdf/2307.08621.pdf☆12Updated last year
- Official implementation of "GPT or BERT: why not both?"☆50Updated 2 weeks ago
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆52Updated 2 months ago
- A practical implementation of GradNorm, Gradient Normalization for Adaptive Loss Balancing, in Pytorch☆88Updated last year
- Local Attention - Flax module for Jax☆20Updated 3 years ago
- Enable RNNLM lattice rescoring with Pytorch [kaldi]☆12Updated 4 years ago
- This repository contains the implementation of the paper: "Span Classification with Structured Information for Disfluency Detection in Sp…☆12Updated last year
- ☆47Updated 7 months ago
- This is a reproduction of the paper 'Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications wit…☆12Updated 3 years ago
- Cyclemoid implementation for PyTorch☆87Updated 3 years ago
- Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.☆12Updated 2 years ago
- ☆73Updated 2 years ago
- Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downs…☆31Updated 4 years ago
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.☆50Updated last week
- Implementation of Perceiver AR, Deepmind's new long-context attention network based on Perceiver architecture, in Pytorch☆86Updated last year
- an implementation of FAdam (Fisher Adam) in PyTorch☆43Updated 10 months ago
- Repository for fine-tuning Transformers 🤗 based seq2seq speech models in JAX/Flax.☆35Updated 2 years ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆98Updated 3 months ago
- Implementation of BitNet-1.58 instruct tuning☆21Updated 11 months ago
- Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)☆60Updated 2 years ago
- several types of attention modules written in PyTorch for learning purposes☆48Updated 6 months ago