idoheinemann / Assembly-Neural-NetworkLinks

A multy-layer feed-forward neural network implementation in assembly x86 32 bits

☆17

Alternatives and similar repositories for Assembly-Neural-Network

Users that are interested in Assembly-Neural-Network are comparing it to the libraries listed below

Sorting:

fkodom / transformer-from-scratch
Code implementation from my blog post: https://fkodom.substack.com/p/transformers-from-scratch-in-pytorch
☆94Updated last year
xrsrke / pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
☆85Updated last year
sayakpaul / keras-xla-benchmarks
Presents comprehensive benchmarks of XLA-compatible pre-trained models in Keras.
☆37Updated last year
nreHieW / loss
Visualising Losses in Deep Neural Networks
☆16Updated last year
Entropy-xcy / bitnet158
☆69Updated last year
kuleshov-group / MODULoRA-Experiment
Evaluation Code repository for the paper "ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers". (2023…
☆13Updated last year
johnma2006 / candle
Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.
☆50Updated last year
CMU-IDeeL / CMU-IDeeL.github.io
11-785 Introduction to Deep Learning (IDeeL) website with logistics and select course materials
☆49Updated this week
AkiRusProd / numpy-transformer
A numpy implementation of the Transformer model in "Attention is All You Need"
☆56Updated 11 months ago
stas00 / ml-ways
ML/DL Math and Method notes
☆61Updated last year
insoochung / transformer_bcq
BCQ tutorial for transformers
☆17Updated 2 years ago
knotgrass / attention
several types of attention modules written in PyTorch for learning purposes
☆54Updated 9 months ago
hunkim / llm_gpu_cal
🔮 LLM GPU Calculator
☆21Updated last year
kshitij12345 / torchnnprofiler
Context Manager to profile the forward and backward times of PyTorch's nn.Module
☆83Updated last year
dmis-lab / Outlier-Safe-Pre-Training
[ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
☆27Updated 3 weeks ago
CerebrasResearch / Sparse-IFT
Official repository of Sparse ISO-FLOP Transformations for Maximizing Training Efficiency
☆25Updated 11 months ago
Ads97 / ForwardForward
Explorations with Geoffrey Hinton's Forward Forward algoithm
☆33Updated last year
tspeterkim / mixed-precision-from-scratch
Mixed precision training from scratch with Tensors and CUDA
☆24Updated last year
leloykun / llama2.cpp
Inference Llama 2 in one file of pure C++
☆83Updated last year
ruslangrimov / mnist-minimal-model
Trying to find out what is the minimal model that can achieve 99% accuracy on MNIST dataset
☆25Updated 6 years ago
apple / ml-hypercloning
☆48Updated 8 months ago
tinygrad / gpuctypes
ctypes wrappers for HIP, CUDA, and OpenCL
☆130Updated last year
gau-nernst / quantized-training
Explore training for quantized models
☆20Updated this week
Oxen-AI / mamba-dive
This is the code that went into our practical dive using mamba as information extraction
☆53Updated last year
lessw2020 / transformer_central
Various transformers for FSDP research
☆37Updated 2 years ago
facebookresearch / projUNN
Fast training of unitary deep network layers from low-rank updates
☆28Updated 2 years ago
vedantroy / gpu_kernels
☆26Updated last year
EleutherAI / oslo
OSLO: Open Source for Large-scale Optimization
☆175Updated last year
linjames0 / Transformer-CUDA
An implementation of the transformer architecture onto an Nvidia CUDA kernel
☆188Updated last year
albanie / algorithms-and-data-structures
Video descriptions and minimalist Python implementations of algorithms and data structures.
☆68Updated last year