gordicaleksa / OpenGemini
Effort to open-source 10.5 trillion parameter Gemini model.
☆17Updated 11 months ago
Related projects ⓘ
Alternatives and complementary repositories for OpenGemini
- Training hybrid models for dummies.☆15Updated 3 weeks ago
- Toy genetic algorithm in Pytorch☆29Updated 8 months ago
- ☆31Updated 2 months ago
- Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)☆43Updated last month
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆23Updated last week
- GoldFinch and other hybrid transformer components☆40Updated 4 months ago
- Implementation of the proposed Spline-Based Transformer from Disney Research☆77Updated 2 weeks ago
- Utilities for PyTorch distributed☆23Updated last year
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆85Updated 2 months ago
- ☆39Updated 10 months ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆109Updated last month
- Implementation of a Light Recurrent Unit in Pytorch☆46Updated last month
- Generate interleaved text and image content in a structured format you can directly pass to downstream APIs.☆25Updated last month
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆94Updated this week
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆35Updated 4 months ago
- [ICML 24 NGSM workshop] Associative Recurrent Memory Transformer implementation and scripts for training and evaluating☆31Updated this week
- ☆77Updated 7 months ago
- ☆17Updated last month
- Pytorch implementation of a simple way to enable (Stochastic) Frame Averaging for any network☆48Updated 4 months ago
- Implementation of a holodeck, written in Pytorch☆17Updated last year
- Load any clip model with a standardized interface☆21Updated 7 months ago
- Collection of autoregressive model implementation☆67Updated this week
- supporting pytorch FSDP for optimizers☆35Updated this week
- Using JAX to generate piano music as MIDI☆38Updated 11 months ago
- ☆18Updated last month
- Normalized Transformer (nGPT)☆94Updated last week
- implementation of https://arxiv.org/pdf/2312.09299☆19Updated 4 months ago
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆29Updated 3 weeks ago
- A scalable implementation of diffusion and flow-matching with XGBoost models, applied to calorimeter data.☆17Updated 3 weeks ago
- Implementation of a modular, high-performance, and simplistic mamba for high-speed applications☆33Updated 2 weeks ago