johnma2006 / candleLinks
Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.
โ52Updated last year
Alternatives and similar repositories for candle
Users that are interested in candle are comparing it to the libraries listed below
Sorting:
- The simplest, fastest repository for training/finetuning medium-sized GPTs.โ168Updated 4 months ago
- A MAD laboratory to improve AI architecture designs ๐งชโ132Updated 10 months ago
- Evaluating the Mamba architecture on the Othello gameโ48Updated last year
- Understand and test language model architectures on synthetic tasks.โ234Updated last month
- Implementation of the Llama architecture with RLHF + Q-learningโ167Updated 9 months ago
- Griffin MQA + Hawk Linear RNN Hybridโ89Updated last year
- โ102Updated 3 months ago
- โ91Updated last year
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmindโ129Updated last year
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"โ85Updated last month
- Collection of autoregressive model implementationโ86Updated 6 months ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amountโฆโ52Updated 2 years ago
- Token Omission Via Attentionโ127Updated last year
- Normalized Transformer (nGPT)โ192Updated 11 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"โ241Updated 4 months ago
- Implementation of GateLoop Transformer in Pytorch and Jaxโ90Updated last year
- โ200Updated last week
- โ81Updated last year
- A byte-level decoder architecture that matches the performance of tokenized Transformers.โ66Updated last year
- nanoGPT-like codebase for LLM trainingโ109Updated 5 months ago
- supporting pytorch FSDP for optimizersโ83Updated 10 months ago
- Fast bare-bones BPE for modern tokenizer trainingโ167Updated 4 months ago
- Custom triton kernels for training Karpathy's nanoGPT.โ19Updated last year
- Explorations into the recently proposed Taylor Series Linear Attentionโ99Updated last year
- Large scale 4D parallelism pre-training for ๐ค transformers in Mixture of Experts *(still work in progress)*โ87Updated last year
- โ166Updated 2 years ago
- Training small GPT-2 style models using Kolmogorov-Arnold networks.โ121Updated last year
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIMโ59Updated last year
- โ58Updated last year
- Accelerated First Order Parallel Associative Scanโ189Updated last year