idoheinemann / Assembly-Neural-NetworkLinks
A multy-layer feed-forward neural network implementation in assembly x86 32 bits
☆17Updated 5 years ago
Alternatives and similar repositories for Assembly-Neural-Network
Users that are interested in Assembly-Neural-Network are comparing it to the libraries listed below
Sorting:
- Code implementation from my blog post: https://fkodom.substack.com/p/transformers-from-scratch-in-pytorch☆94Updated last year
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆85Updated last year
- Presents comprehensive benchmarks of XLA-compatible pre-trained models in Keras.☆37Updated last year
- Visualising Losses in Deep Neural Networks☆16Updated last year
- ☆69Updated last year
- Evaluation Code repository for the paper "ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers". (2023…☆13Updated last year
- Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.☆50Updated last year
- 11-785 Introduction to Deep Learning (IDeeL) website with logistics and select course materials☆49Updated this week
- A numpy implementation of the Transformer model in "Attention is All You Need"☆56Updated 11 months ago
- ML/DL Math and Method notes☆61Updated last year
- BCQ tutorial for transformers☆17Updated 2 years ago
- several types of attention modules written in PyTorch for learning purposes☆54Updated 9 months ago
- 🔮 LLM GPU Calculator☆21Updated last year
- Context Manager to profile the forward and backward times of PyTorch's nn.Module☆83Updated last year
- [ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models☆27Updated 3 weeks ago
- Official repository of Sparse ISO-FLOP Transformations for Maximizing Training Efficiency☆25Updated 11 months ago
- Explorations with Geoffrey Hinton's Forward Forward algoithm☆33Updated last year
- Mixed precision training from scratch with Tensors and CUDA☆24Updated last year
- Inference Llama 2 in one file of pure C++☆83Updated last year
- Trying to find out what is the minimal model that can achieve 99% accuracy on MNIST dataset☆25Updated 6 years ago
- ☆48Updated 8 months ago
- ctypes wrappers for HIP, CUDA, and OpenCL☆130Updated last year
- Explore training for quantized models☆20Updated this week
- This is the code that went into our practical dive using mamba as information extraction☆53Updated last year
- Various transformers for FSDP research☆37Updated 2 years ago
- Fast training of unitary deep network layers from low-rank updates☆28Updated 2 years ago
- ☆26Updated last year
- OSLO: Open Source for Large-scale Optimization☆175Updated last year
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆188Updated last year
- Video descriptions and minimalist Python implementations of algorithms and data structures.☆68Updated last year