arcee-ai/trinity-large-tech-report

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/arcee-ai/trinity-large-tech-report)

arcee-ai / trinity-large-tech-report

☆126

Alternatives and similar repositories for trinity-large-tech-report

Users that are interested in trinity-large-tech-report are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

microsoft / ArchScale
View on GitHub
Simple & Scalable Pretraining for Neural Architecture Research
☆340Mar 31, 2026Updated 3 months ago
PythonNut / superbpe
View on GitHub
Official code release for "SuperBPE: Space Travel for Language Models"
☆97May 28, 2026Updated last month
samsja / muon_fsdp_2
View on GitHub
Muon fsdp 2
☆64Aug 8, 2025Updated 11 months ago
sail-sg / SkyLadder
View on GitHub
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆43Dec 29, 2025Updated 6 months ago
NX-AI / xlstm_scaling_laws
View on GitHub
Code and data to explore neural scaling laws of xLSTM and Transformer models.
☆23Apr 8, 2026Updated 3 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
tilde-research / nsa-release
View on GitHub
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆133Jun 24, 2025Updated last year
dame-cell / Triformer
View on GitHub
Transformers components but in Triton
☆34May 9, 2025Updated last year
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year
zhehangdu / Newton-Muon
View on GitHub
The Newton-Muon optimizer
☆30Jun 5, 2026Updated last month
datologyai / DatBench
View on GitHub
☆30Apr 28, 2026Updated 2 months ago
arcee-ai / pybubble
View on GitHub
☆81Feb 18, 2026Updated 5 months ago
menhguin / minp_paper
View on GitHub
Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper
☆51Aug 13, 2025Updated 11 months ago
GindaChen / FlexFlashAttention3
View on GitHub
FlexAttention w/ FlashAttention3 Support
☆27Oct 5, 2024Updated last year
shawntan / stickbreaking-attention
View on GitHub
Stick-breaking attention
☆63Jul 1, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Triang-jyed-driung / i8muon
View on GitHub
Muon in Int8 Precision Made Possible
☆20Jun 18, 2026Updated last month
Yifei-Zuo / Parallax
View on GitHub
Official repository for Parallax (Parameterized Local Linear Attention)
☆65Jul 7, 2026Updated 2 weeks ago
Farseer-Scaling-Law / Farseer
View on GitHub
☆21Jun 12, 2025Updated last year
xjdr-alt / llmri
View on GitHub
look how they massacred my boy
☆63Oct 16, 2024Updated last year
sanderland / script_tok
View on GitHub
Code for the paper "BPE stays on SCRIPT", "Which Pieces Does Unigram Tokenization Really Need?" and MinGram
☆18Jun 26, 2026Updated last month
Noumena-Network / nmoe
View on GitHub
MoE training for Me and You and maybe other people
☆394Mar 15, 2026Updated 4 months ago
Doraemonzzz / nanoTransNormer
View on GitHub
☆11Oct 11, 2023Updated 2 years ago
emorynlp / seq2seq-corenlp
View on GitHub
☆13Feb 7, 2023Updated 3 years ago
cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
Ryu1845 / hyena-jax
View on GitHub
Implementation of Hyena Hierarchy in JAX
☆10Apr 30, 2023Updated 3 years ago
secemp9 / rubrics
View on GitHub
a bunch of rubrics I made in different format and structure for llm judge and other use cases
☆16Sep 22, 2025Updated 10 months ago
Noumena-Network / NSA-Test
View on GitHub
NSA Triton Kernels written with GPT5 and Opus 4.1
☆70Aug 12, 2025Updated 11 months ago
automl / DeltaProduct
View on GitHub
DeltaProduct is a new linear recurrent neural network architecture that uses products of generalized Householder matrices as state-transi…
☆15Oct 13, 2025Updated 9 months ago
yaof20 / DenseMixer
View on GitHub
Official implementation for DenseMixer: Improving MoE Post-Training with Precise Router Gradient
☆68Aug 3, 2025Updated 11 months ago
LZhengisme / CODA
View on GitHub
Implementation of Cascaded Head-colliding Attention (ACL'2021)
☆11Sep 16, 2021Updated 4 years ago
PrimeIntellect-ai / prime-rl
View on GitHub
Agentic RL Training at Scale
☆1,724Updated this week
renll / SeqBoat
View on GitHub
[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling
☆40Dec 2, 2023Updated 2 years ago
samsja / pydantic_config
View on GitHub
Manage ML configuration with pydantic
☆16Mar 18, 2026Updated 4 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
nil0x9 / flash-muon
View on GitHub
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆258Jun 15, 2025Updated last year
microsoft / AttentionEngine
View on GitHub
☆123May 19, 2025Updated last year
thunlp / BlockFFN
View on GitHub
Source codes for paper "BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity".
☆19Jan 10, 2026Updated 6 months ago
aHapBean / xHC
View on GitHub
[Tech Report] Expanded Hyper-Connections
☆49Updated this week
tilde-research / aurora-release
View on GitHub
Aurora optimizer release
☆150Jul 18, 2026Updated last week
iwiwi / epochraft
View on GitHub
Checkpointable dataset utilities for foundation model training
☆32Jan 29, 2024Updated 2 years ago
srush / tangent
View on GitHub
Source-to-Source Debuggable Derivatives in Pure Python
☆15Jan 23, 2024Updated 2 years ago