google-research / causallm_icl

☆10

Related projects: ⓘ

jenni-ai / T2FW
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
☆19Updated last year
RandallBalestriero / SplineLLM
☆12Updated 9 months ago
EleutherAI / mdl
Minimum Description Length probing for neural network representations
☆15Updated 11 months ago
ictnlp / PCFG-NAT
Code for NeurIPS 2023 paper "Non-autoregressive Machine Translation with Probabilistic Context-free Grammar".
☆10Updated 8 months ago
ethanlshen / HierNet
Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…
☆18Updated 10 months ago
RobertCsordas / linear_layer_as_attention
The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …
☆15Updated 10 months ago
google-research / interpretability-theory
☆24Updated last year
shoaibahmed / metadata_archaeology
Official code for the paper: "Metadata Archaeology"
☆18Updated last year
rtaori / data_feedback
Code for the paper "Data Feedback Loops: Model-driven Amplification of Dataset Biases"
☆15Updated 2 years ago
sustcsonglin / gated_linear_attention_layer
☆30Updated 8 months ago
OpenNLPLab / ETSC-Exact-Toeplitz-to-SSM-Conversion
[EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…
☆14Updated 11 months ago
allenai / sso
Repository for Skill Set Optimization
☆12Updated last month
wzq016 / PINE
Offcial Repo of Paper "Eliminating Position Bias of Language Models: A Mechanistic Approach""
☆10Updated 3 weeks ago
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆34Updated 10 months ago
RAIVNLab / MatFormer-OLMo
Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…
☆17Updated 10 months ago
IDSIA / fpainter
Official repository for the paper "Images as Weight Matrices: Sequential Image Generation Through Synaptic Learning Rules" (ICLR 2023)
☆12Updated last year
prateeky2806 / ComPEFT
☆25Updated 9 months ago
WangFei-2019 / SNARE
Project for SNARE benchmark
☆10Updated 3 months ago
smonsays / hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
☆20Updated 2 months ago
AndyShih12 / LongHorizonTemperatureScaling
PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023
☆18Updated last year
Yuanhy1997 / HyPe
HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation [ACL 2023]
☆13Updated last year
abhishekpanigrahi1996 / transformer_in_transformer
☆44Updated 11 months ago
Doraemonzzz / hgru2-pytorch
☆19Updated last month
Raincleared-Song / ConPET
Source code for a LoRA-based continual relation extraction method.
☆10Updated 11 months ago
lucidrains / coordinate-descent-hierarchical-memory
☆14Updated this week
salesforce / simplification
☆19Updated last year
AlirezaMorsali / MLP-Attention
☆12Updated 8 months ago
renll / SeqBoat
[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling
☆34Updated 9 months ago
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆24Updated 5 months ago
radarFudan / Curse-of-memory
Curse-of-memory phenomenon of RNNs in sequence modelling
☆17Updated this week