nihil21 / parallel_nn
C++ implementation of a neural network using OpenMP and CUDA for parallelization.
☆9Updated 3 years ago
Alternatives and similar repositories for parallel_nn:
Users that are interested in parallel_nn are comparing it to the libraries listed below
- Minimal pretraining script for language modeling in PyTorch. Supporting torch compilation and DDP. It includes a model implementation and…☆12Updated last week
- Code for a workshop hosted at the MLOps World Summit '22☆17Updated 2 years ago
- Intel® End-to-End AI Optimization Kit☆31Updated 8 months ago
- "Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices", official implementation☆29Updated last month
- Parallel implementations of the Decision Tree Classifier algorithm in CUDA, OPENMP & MPI☆5Updated 7 years ago
- (NeurIPS-2019 MicroNet Challenge - 3rd Winner) Open source code for "SIPA: A simple framework for efficient networks"☆18Updated 2 years ago
- ☆13Updated last year
- Real Time Object Detection using OpenCV and Deep Learning☆10Updated 2 months ago
- The cuda code is mainly for nvidia hardware device. This repo will show how to run cuda c or cuda cpp code on the google colab platform f…☆24Updated last year
- Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention (CVPR 2022)☆20Updated 2 years ago
- MLPNAS code for Paperspace series on Neural Architecture Search☆22Updated last year
- ☆13Updated 2 years ago
- The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"☆44Updated last year
- FL_PyTorch: Optimization Research Simulator for Federated Learning☆35Updated last year
- ☆43Updated last year
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆68Updated 10 months ago
- Includes additional materials for the following keras.io blog post.☆12Updated 3 years ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 8 months ago
- PyTorch implementation of HashedNets☆36Updated last year
- Residual Quantization Autoencoder, used for interpreting LLMs☆11Updated 3 months ago
- Arch-Net: Model Distillation for Architecture Agnostic Model Deployment☆22Updated 3 years ago
- Official code for our NeurIPS 2024 paper "einspace: Searching for Neural Architectures from Fundamental Operations"☆27Updated 5 months ago
- Tiny ImageNet Classification Exercise with PyTorch☆16Updated 3 years ago
- Little article showing how to load pytorch's models with linear memory consumption☆34Updated 2 years ago
- Personal solutions to the Triton Puzzles☆18Updated 8 months ago
- Dynamic Neural Architecture Search Toolkit☆29Updated 3 months ago
- A simple implementation of [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://arxiv.org/abs/2312.00752)☆21Updated last year
- Contains my experiments with the `big_vision` repo to train ViTs on ImageNet-1k.☆22Updated 2 years ago
- ☆11Updated 2 years ago
- Large dataset storage format for Pytorch☆45Updated 3 years ago