shizhediao / awesome-transformersLinks
A curated list of resources dedicated to Transformers.
☆8Updated 4 years ago
Alternatives and similar repositories for awesome-transformers
Users that are interested in awesome-transformers are comparing it to the libraries listed below
Sorting:
- Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implici…☆106Updated last year
- ☆232Updated last year
- ☆83Updated last year
- ☆166Updated last year
- ☆19Updated 2 months ago
- nanoGPT-like codebase for LLM training☆98Updated last month
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆136Updated last year
- Omnigrok: Grokking Beyond Algorithmic Data☆58Updated 2 years ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆190Updated last year
- Skill-It! A Data-Driven Skills Framework for Understanding and Training Language Models☆46Updated last year
- ☆53Updated last year
- [NeurIPS 2023] Learning Transformer Programs☆161Updated last year
- Neural Networks and the Chomsky Hierarchy☆205Updated last year
- ☆67Updated 2 years ago
- ☆121Updated last year
- A library to create and manage configuration files, especially for machine learning projects.☆78Updated 3 years ago
- ☆35Updated 6 months ago
- EMNLP 2020: On the Ability and Limitations of Transformers to Recognize Formal Languages☆24Updated 4 years ago
- A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643☆75Updated last year
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆48Updated 8 months ago
- ☆180Updated last year
- Framework code with wandb, checkpointing, logging, configs, experimental protocols. Useful for fine-tuning models or training from scratc…☆150Updated 2 years ago
- Sequence modeling with Mega.☆296Updated 2 years ago
- ☆93Updated 11 months ago
- Influence Functions with (Eigenvalue-corrected) Kronecker-Factored Approximate Curvature☆156Updated this week
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day☆257Updated last year
- ☆60Updated 3 years ago
- Code for the paper "Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression"☆21Updated 2 years ago
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆80Updated last year
- Language models scale reliably with over-training and on downstream tasks☆97Updated last year