demegire / Parameterization-of-Hypercomplex-Multiplications
This is a reproduction of the paper 'Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters' by Ege Demir and Mehmet Barutçu
☆12Updated 3 years ago
Alternatives and similar repositories for Parameterization-of-Hypercomplex-Multiplications:
Users that are interested in Parameterization-of-Hypercomplex-Multiplications are comparing it to the libraries listed below
- Calculating Expected Time for training LLM.☆38Updated last year
- ResiDual: Transformer with Dual Residual Connections, https://arxiv.org/abs/2304.14802☆93Updated last year
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆37Updated 3 years ago
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆34Updated last year
- ☆51Updated 2 years ago
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆48Updated 3 years ago
- Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch☆73Updated 2 years ago
- My explorations into editing the knowledge and memories of an attention network☆34Updated 2 years ago
- A PyTorch Implementation of the Luna: Linear Unified Nested Attention☆41Updated 3 years ago
- ☆64Updated 7 months ago
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆99Updated 2 years ago
- Implementation of a Light Recurrent Unit in Pytorch☆47Updated 5 months ago
- Relative Positional Encoding for Transformers with Linear Complexity☆62Updated 3 years ago
- Unofficial PyTorch Implementation for pNLP-Mixer: an Efficient all-MLP Architecture for Language (https://arxiv.org/abs/2202.04350)☆63Updated 3 years ago
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Updated last year
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Updated 2 years ago
- Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"☆57Updated last year
- ☆20Updated last year
- Randomized Positional Encodings Boost Length Generalization of Transformers☆80Updated last year
- About, prompt-based few-shot learning, Text Generation with Prompting☆13Updated last year
- Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"☆70Updated last year
- ☆21Updated last year
- Toy genetic algorithm in Pytorch☆34Updated last week
- Polyglot을 활용한 image-text multimodal☆11Updated last year
- Implementation of Agent Attention in Pytorch☆90Updated 8 months ago
- An annotated implementation of the Hyena Hierarchy paper☆32Updated last year
- ☆13Updated 2 years ago
- [ACL 2023] Gradient Ascent Post-training Enhances Language Model Generalization☆29Updated 6 months ago
- Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch☆88Updated last year
- 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.☆82Updated 3 years ago