sunyt32/torchscale

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sunyt32/torchscale)

sunyt32 / torchscale

Transformers at any scale

☆42

Alternatives and similar repositories for torchscale

Users that are interested in torchscale are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

shyyhs / CourseraParallelCorpusMining
View on GitHub
Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation
☆15Aug 27, 2024Updated last year
jackbandy / bookcorpus-datasheet
View on GitHub
Documentation effort for the BookCorpus dataset
☆34Jun 2, 2021Updated 5 years ago
jenni-ai / T2FW
View on GitHub
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
☆20Oct 9, 2022Updated 3 years ago
chijames / KERPLE
View on GitHub
☆20Oct 25, 2022Updated 3 years ago
mynlp / niilc-qa
View on GitHub
NIILC QA data
☆18Nov 20, 2015Updated 10 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
JohnTailor / BertSenClu
View on GitHub
Topic Model based on Pretrained Sentence Embeddings (with BERT)
☆13Feb 8, 2023Updated 3 years ago
McGill-NLP / length-generalization
View on GitHub
Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023
☆139Apr 30, 2024Updated 2 years ago
Refugee-Law-Lab / scc_bulk_data
View on GitHub
Bulk access to Supreme Court of Canada Decisions
☆10Aug 4, 2025Updated 11 months ago
AndyShih12 / LongHorizonTemperatureScaling
View on GitHub
PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023
☆21May 31, 2023Updated 3 years ago
Noahs-ARK / rational-recurrences
View on GitHub
Implementation for "Rational Recurrences", Peng et al., EMNLP 2018.
☆28Jun 21, 2022Updated 4 years ago
stevenxcao / subnetwork-probing
View on GitHub
☆14Apr 8, 2021Updated 5 years ago
BryceZhuo / HybridNorm
View on GitHub
The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
☆19Mar 7, 2025Updated last year
aws-samples / rag-based-translation-with-dynamodb-and-bedrock
View on GitHub
☆15Dec 10, 2025Updated 7 months ago
lucidrains / gated-state-spaces-pytorch
View on GitHub
Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch
☆101Feb 25, 2023Updated 3 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
bdusell / stack-attention
View on GitHub
Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"
☆18Mar 15, 2024Updated 2 years ago
jermp / tongrams_estimation
View on GitHub
A C++ library implementing fast language models estimation using the 1-Sort algorithm.
☆16May 18, 2023Updated 3 years ago
ictnlp / PCFG-NAT
View on GitHub
Code for NeurIPS 2023 paper "Non-autoregressive Machine Translation with Probabilistic Context-free Grammar".
☆12Jan 4, 2024Updated 2 years ago
allenai / HyBayes
View on GitHub
Bayesian Assessment of Hypotheses
☆26Jul 6, 2023Updated 3 years ago
ofirpress / attention_with_linear_biases
View on GitHub
Code for the ALiBi method for transformer language models (ICLR 2022)
☆558Oct 30, 2023Updated 2 years ago
Ankur3107 / dpr-tf
View on GitHub
Dense Passage Retrieval using tensorflow-keras on TPU
☆17Jun 27, 2021Updated 5 years ago
speechpro / mixup
View on GitHub
☆24Mar 13, 2020Updated 6 years ago
RobGrimm / HierarchicalSoftmax
View on GitHub
Hierarchical Softmax Layer
☆18Oct 7, 2015Updated 10 years ago
RobertCsordas / moe_layer
View on GitHub
sigma-MoE layer
☆21Jan 5, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
tmu-nlp / TwitterCorpus
View on GitHub
首都大日本語 Twitter コーパス
☆21Mar 14, 2016Updated 10 years ago
Takeuchi-Lab-LM / python_asa
View on GitHub
python版日本語意味役割付与システム（ASA）
☆22Nov 11, 2022Updated 3 years ago
zhisbug / ray-scalable-ml-design
View on GitHub
Some microbenchmarks and design docs before commencement
☆11Feb 1, 2021Updated 5 years ago
CarperAI / Algorithm-Distillation-RLHF
View on GitHub
☆35Jan 29, 2023Updated 3 years ago
BlinkDL / SmallInitEmb
View on GitHub
LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence
☆61Feb 21, 2022Updated 4 years ago
RobertCsordas / moe_attention
View on GitHub
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆101Sep 30, 2024Updated last year
amirzandieh / HyperAttention
View on GitHub
Triton Implementation of HyperAttention Algorithm
☆48Dec 11, 2023Updated 2 years ago
LeiWang1999 / AutoGPTQ.tvm
View on GitHub
GPTQ inference TVM kernel
☆41Apr 25, 2024Updated 2 years ago
kaushal0494 / ZmBART
View on GitHub
☆11Mar 19, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ku-nlp / JMRD
View on GitHub
Japanese Movie Recommendation Dialogue dataset
☆29Jul 19, 2022Updated 4 years ago
idiap / icassp-oov-recognition
View on GitHub
Data and code related to the ICASSP submission "A comparison of methods for OOV-word recognition"
☆17Nov 28, 2021Updated 4 years ago
DachengLi1 / AMP
View on GitHub
(NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.
☆44Nov 4, 2022Updated 3 years ago
HazyResearch / based
View on GitHub
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆256Jun 6, 2025Updated last year
microsoft / BANG
View on GitHub
BANG is a new pretraining model to Bridge the gap between Autoregressive (AR) and Non-autoregressive (NAR) Generation. AR and NAR generat…
☆28Feb 6, 2022Updated 4 years ago
subho406 / agalite
View on GitHub
AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning (Published in TMLR)
☆24Oct 15, 2024Updated last year
ghaddarAbs / WiNER
View on GitHub
☆32Aug 4, 2021Updated 4 years ago