pkuzengqi / SkyformerView external linksLinks
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)
☆63Apr 19, 2022Updated 3 years ago
Alternatives and similar repositories for Skyformer
Users that are interested in Skyformer are comparing it to the libraries listed below
Sorting:
- Minimal code to train ELMo models in recent versions of TensorFlow☆14Apr 30, 2023Updated 2 years ago
- ☆16May 6, 2021Updated 4 years ago
- The accompanying code for "Simplifying and Understanding State Space Models with Diagonal Linear RNNs" (Ankit Gupta, Harsh Mehta, Jonatha…☆23Dec 30, 2022Updated 3 years ago
- ☆10Jun 14, 2023Updated 2 years ago
- This repository reproduces the results in the paper "How expressive are transformers in spectral domain for graphs?"(published in TMLR)☆12Jul 10, 2022Updated 3 years ago
- ☆10Sep 13, 2022Updated 3 years ago
- Literature Review/ Summary of methods for extraction of causal relations from text☆10Oct 6, 2021Updated 4 years ago
- Our paper is titled "NUS-IDS at FinCausal 2021: Dependency Tree in Graph Neural Networks for better Cause-Effect Span Detection".☆13Feb 11, 2022Updated 4 years ago
- MaXM is a suite of test-only benchmarks for multilingual visual question answering in 7 languages: English (en), French (fr), Hindi (hi),…☆13Jan 16, 2024Updated 2 years ago
- Spectral Graph Attention Network with Fast Eigen-approximation☆12Dec 24, 2021Updated 4 years ago
- This is the public github for our paper "Transformer with a Mixture of Gaussian Keys"☆28Aug 13, 2022Updated 3 years ago
- Code for the ACL-2022 paper "StableMoE: Stable Routing Strategy for Mixture of Experts"☆51Jul 17, 2022Updated 3 years ago
- 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated 10 months ago
- Implementation of Nonparametric Hamiltonian Monte Carlo☆13Feb 13, 2023Updated 3 years ago
- ☆21Dec 5, 2022Updated 3 years ago
- IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization☆12Nov 23, 2021Updated 4 years ago
- Staged Training for Transformer Language Models☆33Mar 31, 2022Updated 3 years ago
- A State-Space Model with Rational Transfer Function Representation.☆83May 17, 2024Updated last year
- Code for "Theoretical Foundations of Deep Selective State-Space Models" (NeurIPS 2024)☆15Jan 7, 2025Updated last year
- “Style Transfer as Data Augmentation: A Case Study on Named Entity Recognition” (EMNLP 2022)☆16Feb 2, 2023Updated 3 years ago
- ☆12Nov 3, 2021Updated 4 years ago
- Code and data for Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction (ECML-PKDD 22)☆15Sep 6, 2022Updated 3 years ago
- Repo for: When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment☆38Jun 5, 2023Updated 2 years ago
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆34Aug 6, 2023Updated 2 years ago
- Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"☆13Dec 14, 2021Updated 4 years ago
- Getting interpretable dimensions in word embedding spaces.☆15Jul 6, 2023Updated 2 years ago
- [EMNLP 2022] Language Model Pre-Training with Sparse Latent Typing☆14Feb 10, 2023Updated 3 years ago
- reproduces experiments from "Grounding inductive biases in natural images: invariance stems from variations in data"☆17Sep 25, 2024Updated last year
- Implementation and experiments for Partially Supervised NER via Expected Entity Ratio in TACL 2022☆14Nov 7, 2022Updated 3 years ago
- Implementation of Spectral State Space Models☆16Feb 23, 2024Updated last year
- [ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention☆198Dec 2, 2022Updated 3 years ago
- Long Range Arena for Benchmarking Efficient Transformers☆777Dec 16, 2023Updated 2 years ago
- Curse-of-memory phenomenon of RNNs in sequence modelling☆19May 8, 2025Updated 9 months ago
- STABILIZING GRADIENTS FOR DEEP NEURAL NETWORKS VIA EFFICIENT SVD PARAMETERIZATION☆16Jun 5, 2018Updated 7 years ago
- Using FlexAttention to compute attention with different masking patterns☆47Sep 22, 2024Updated last year
- Blog post☆17Feb 16, 2024Updated 2 years ago
- NLP Examples using the 🤗 libraries☆40Feb 21, 2021Updated 4 years ago
- Self-Supervised Document-to-Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference☆45Nov 28, 2022Updated 3 years ago
- Group-conditional DRO to alleviate spurious correlations☆15Jul 15, 2021Updated 4 years ago