AlirezaAzadbakht / kernel-sharingLinks
Drastically Reducing the Number of Trainable Parameters in Deep CNNs by Inter-layer Kernel-sharing
☆13Updated 2 years ago
Alternatives and similar repositories for kernel-sharing
Users that are interested in kernel-sharing are comparing it to the libraries listed below
Sorting:
- JAX Scalify: end-to-end scaled arithmetics☆17Updated last year
- Implementation of Hyena Hierarchy in JAX☆10Updated 2 years ago
- Official Code Repository for the paper "Key-value memory in the brain"☆31Updated 10 months ago
- High-performance tokenized language data-loader for Python C++ extension☆14Updated last year
- Repository for CPU Kernel Generation for LLM Inference☆27Updated 2 years ago
- Code for the note "NF4 Isn't Information Theoretically Optimal (and that's Good)☆21Updated 2 years ago
- ☆52Updated last year
- ☆12Updated last year
- Implementation of a holodeck, written in Pytorch☆18Updated 2 years ago
- Codes of the paper Deformable Butterfly: A Highly Structured and Sparse Linear Transform.☆13Updated 4 years ago
- GoldFinch and other hybrid transformer components☆12Updated last month
- Implementation of a Light Recurrent Unit in Pytorch☆49Updated last year
- ☆24Updated last year
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆32Updated last year
- Visualising Losses in Deep Neural Networks☆16Updated last year
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆15Updated last year
- Flax Image Models - State-of-the-art pre-trained vision backbones for Flax.☆23Updated 7 months ago
- ☆13Updated 3 weeks ago
- RWKV6 in native pytorch and triton:)☆11Updated last year
- Benchmarking PyTorch 2.0 different models☆20Updated 2 years ago
- Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)☆55Updated 9 months ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated last year
- Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…☆46Updated 4 months ago
- GoldFinch and other hybrid transformer components☆45Updated last year
- Rust bindings for CTranslate2☆14Updated 2 years ago
- Effort to open-source 10.5 trillion parameter Gemini model.☆17Updated 2 years ago
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆29Updated 5 years ago
- Training hybrid models for dummies.☆29Updated 2 months ago
- Interpretability analysis of language model outlier and attempts to distill the model☆13Updated 2 years ago
- PyTorch reimplementation of the paper "HyperMixer: An MLP-based Green AI Alternative to Transformers" [arXiv 2022].☆18Updated 3 years ago