davidar / eigenGPT
Minimal C++ implementation of GPT2
☆40Updated last year
Related projects ⓘ
Alternatives and complementary repositories for eigenGPT
- MACTA: A Multi-agent Reinforcement Learning Approach for Cache Timing Attacks and Detection☆45Updated last year
- ☆32Updated 5 months ago
- throwaway GPT inference☆139Updated 5 months ago
- FlexAttention w/ FlashAttention3 Support☆26Updated last month
- Experimental scripts for researching data adaptive learning rate scheduling.☆23Updated last year
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆66Updated 5 months ago
- Make triton easier☆41Updated 4 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated 11 months ago
- ☆82Updated 8 months ago
- A thin, highly portable toolkit for efficiently compiling dense loop-based computation.☆147Updated last year
- Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)☆85Updated 10 months ago
- This code accompanies the paper "Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration."☆22Updated 2 weeks ago
- Code for the paper "Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making"☆21Updated 3 months ago
- Jax like function transformation engine but micro, microjax☆26Updated 2 weeks ago
- LLM training in simple, raw C/CUDA☆17Updated 6 months ago
- LLM training in simple, raw C/CUDA☆86Updated 6 months ago
- The code for the paper "A Bayesian Approach to Online Planning" published in ICML 2024.☆10Updated 4 months ago
- Fast, Multi-threaded Matrix Multiplication in C☆178Updated 3 weeks ago
- ☆49Updated 7 months ago
- Official Implementation of NeurIPS'23 Paper "Cross-Episodic Curriculum for Transformer Agents"☆31Updated last year
- Better bindings for Python☆17Updated last year
- Implementations of Curious Replay for model-based adaptation.☆36Updated last year
- an environment based on XLA for deep learning compiler optimization research.☆23Updated last year
- Standalone commandline CLI tool for compiling Triton kernels☆15Updated last month
- Efficiently send large arrays across machines☆15Updated 3 months ago
- C++ raytracer that supports custom models. Supports running the calculations on the CPU using C++11 threads or in the GPU via CUDA.☆74Updated last year
- GPT implementation in Flax☆18Updated 2 years ago
- Gpu benchmark☆43Updated last month