clabrugere / scratch-llmLinks
Implements a LLM similar to Meta's Llama 2 from the ground up in PyTorch, for educational purposes.
☆37Updated 8 months ago
Alternatives and similar repositories for scratch-llm
Users that are interested in scratch-llm are comparing it to the libraries listed below
Sorting:
- Gemma2(9B), Llama3-8B-Finetune-and-RAG, code base for sample, implemented in Kaggle platform☆22Updated 8 months ago
- Manages vllm-nccl dependency☆17Updated last year
- minimal scripts for 24GB VRAM GPUs. training, inference, whatever☆42Updated 2 weeks ago
- a curated list of the role of small models in the LLM era☆105Updated last year
- Tutorial for LLM developers about engine design, service deployment, evaluation/benchmark, etc. Provide a C/S style optimized LLM inferen…☆19Updated 2 years ago
- Benchmarking PyTorch 2.0 different models☆20Updated 2 years ago
- Fast and memory-efficient exact attention ported to rocm☆11Updated last year
- ☆17Updated last year
- Playground for Transformers☆53Updated last year
- several types of attention modules written in PyTorch for learning purposes☆52Updated last year
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆13Updated 9 months ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆56Updated 2 weeks ago
- Various test models in WNNX format. It can view with `pip install wnetron && wnetron`☆12Updated 3 years ago
- TensorRT LLM Benchmark Configuration☆13Updated last year
- Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.☆12Updated last year
- KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference☆22Updated 4 months ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Updated 2 weeks ago
- Inference Llama 2 in C++☆43Updated last year
- Experiments with BitNet inference on CPU☆54Updated last year
- Tiny C++ LLM inference implementation from scratch☆66Updated last month
- Context Manager to profile the forward and backward times of PyTorch's nn.Module☆82Updated 2 years ago
- minimal GRPO implementation from scratch☆98Updated 6 months ago
- Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs☆110Updated last year
- ☆57Updated last year
- Create a source of truth for ML model results and browse it on Papers with Code☆33Updated 4 years ago
- Multi-Layer Key-Value sharing experiments on Pythia models☆34Updated last year
- A Jupyter widget to visualize tensor data in notebooks.☆59Updated last year
- ☆78Updated 10 months ago
- Visual similarity search engine demo with use of PyTorch Metric Learning and Qdrant☆12Updated 2 years ago
- Benchmark tests supporting the TiledCUDA library.☆17Updated 10 months ago