paulilioaica / Differential-TransformerLinks
☆16Updated 7 months ago
Alternatives and similar repositories for Differential-Transformer
Users that are interested in Differential-Transformer are comparing it to the libraries listed below
Sorting:
- Official Code for ICLR 2024 Paper: Non-negative Contrastive Learning☆46Updated last year
- Towards Understanding the Mixture-of-Experts Layer in Deep Learning☆30Updated last year
- ☆33Updated 10 months ago
- MultiModN – Multimodal, Multi-Task, Interpretable Modular Networks (NeurIPS 2023)☆33Updated last year
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆54Updated 11 months ago
- PyTorch implementation of Pseudo-Riemannian Graph Convolutional Networks (NeurIPS'22))☆16Updated 11 months ago
- C-Mixup for NeurIPS 2022☆70Updated last year
- [NeurIPS 2023] Factorized Contrastive Learning: Going Beyond Multi-view Redundancy☆67Updated last year
- [ICML 2025] Official implementation of "AdaPTS: Adapting Univariate Foundation Models to Probabilistic Multivariate Time Series Forecasti…☆26Updated last week
- [NeurIPS 2024] Official implementation of the paper "MambaLRP: Explaining Selective State Space Sequence Models".☆38Updated 7 months ago
- Curse-of-memory phenomenon of RNNs in sequence modelling☆19Updated last month
- Official implementation of ICLR 2024 paper "Contrastive Learning Is Spectral Clustering On Similarity Graph" (https://arxiv.org/abs/2303.…☆20Updated 8 months ago
- [NeurIPS '24] Code repo for the paper entitled "Learning Structured Representations with Hyperbolic Embeddings" at NeurIPS 2024☆12Updated 4 months ago
- [ICML'24] Official PyTorch Implementation of TimeX++☆26Updated 7 months ago
- An offical implementation of EHRDiff [TMLR]☆25Updated 11 months ago
- Official implementation of ICML 2024 paper "Matrix Information Theory for Self-supervised Learning" (https://arxiv.org/abs/2305.17326)☆28Updated 8 months ago
- ☆31Updated 7 months ago
- ☆9Updated 2 years ago
- Decoupled Kullback-Leibler Divergence Loss (DKL), NeurIPS 2024 / Generalized Kullback-Leibler Divergence Loss (GKL)☆44Updated last week
- Official code for ICML 2024 paper "An Unsupervised Approach for Periodic Source Detection in Time Series"☆9Updated 3 months ago
- Spatial Mixture-of-Experts☆20Updated 2 years ago
- User-friendly implementation of the Mixture-of-Sparse-Attention (MoSA). MoSA selects distinct tokens for each head with expert choice rou…☆17Updated last month
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆55Updated 2 months ago
- Official code for ICLR 2023 paper "ContraNorm: A Contrastive Learning Perspective on Oversmoothing and Beyond "☆35Updated 2 years ago
- The official Pytorch implementation of the paper "Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT …☆38Updated last year
- ☆11Updated last year
- I2M2: Jointly Modeling Inter- & Intra-Modality Dependencies for Multi-modal Learning (NeurIPS 2024)☆19Updated 7 months ago
- BackTime: Backdoor Attacks on Multivariate Time Series Forecasting☆21Updated last month
- ☆10Updated 2 years ago
- [NeurIPS 2024 Oral] Repository of the CMuST paper: "Get Rid of Isolation: A Continuous Multi-task Spatio-Temporal Learning Framework"☆12Updated 2 months ago