facebookresearch / palLinks

PAL: Predictive Analysis & Laws of Large Language Models

☆36

Alternatives and similar repositories for pal

Users that are interested in pal are comparing it to the libraries listed below

Sorting:

facebookresearch / ModelRatatouille
Recycling diverse models
☆45Updated 2 years ago
mistralai / mistral-evals
☆75Updated 3 months ago
uclaml / MoE
Towards Understanding the Mixture-of-Experts Layer in Deep Learning
☆31Updated last year
ShadeAlsha / ICon
ICLR 2025 - official implementation for "I-Con: A Unifying Framework for Representation Learning"
☆109Updated last month
fkodom / soft-mixture-of-experts
PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)
☆76Updated last year
lucidrains / AMIE-pytorch
Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind
☆66Updated 10 months ago
lucidrains / mirasol-pytorch
Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch
☆89Updated last year
NielsRogge / awesome-huggingface
Repository containing awesome resources regarding Hugging Face tooling.
☆48Updated last year
apple / ml-mofi
☆59Updated last year
tum-ai / number-token-loss
A regression-alike loss to improve numerical reasoning in language models
☆24Updated 3 weeks ago
kyegomez / SimpleMamba
Implementation of a modular, high-performance, and simplistic mamba for high-speed applications
☆36Updated 9 months ago
lucidrains / mind-evolution
Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmind
☆56Updated 2 months ago
GenRobo / MatMamba
Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"
☆60Updated 8 months ago
hyperevolnet / Terminator
The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.
☆38Updated 4 months ago
lucidrains / infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
☆110Updated 7 months ago
luyug / magix
Supercharge huggingface transformers with model parallelism.
☆77Updated 2 weeks ago
stas00 / ml-ways
ML/DL Math and Method notes
☆63Updated last year
google-research / optformer
☆226Updated this week
vdlad / Remarkable-Robustness-of-LLMs
Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"
☆18Updated 2 months ago
kyegomez / MambaFormer
Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…
☆21Updated 2 weeks ago
foundation-model-stack / bamba
Train, tune, and infer Bamba model
☆131Updated 2 months ago
microsoft / CoML
Interactive coding assistant for data scientists and machine learning developers, empowered by large language models.
☆95Updated 10 months ago
LLM360 / k2-train
☆50Updated last year
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆100Updated 11 months ago
SamsungSAILMontreal / nino
Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]
☆19Updated 2 months ago
r-three / phatgoose
Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"
☆86Updated last year
naver-ai / model-stock
Model Stock: All we need is just a few fine-tuned models
☆121Updated 10 months ago
microsoft / Industrial-Foundation-Models
Dedicated to building industrial foundation models for universal data intelligence across industries.
☆57Updated 11 months ago
eth-easl / fmengine
Utilities for Training Very Large Models
☆58Updated 10 months ago
lucidrains / mixture-of-attention
Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
☆120Updated 9 months ago