goombalab / phi-mamba

Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models)
61Updated this week

Related projects: