viai957 / llama-inference
A simple implementation of Llama 1, 2. Llama Architecture built from scratch using PyTorch all the models are built from scratch that includes GQA (Grouped Query Attention) , RoPE (Rotary Positional Embeddings) , RMS Norm, FeedForward Block, Encoder (as this is only for Inferencing the model) , SwiGLU (Activation Function),
☆10Updated 4 months ago
Related projects: ⓘ
- Step by step explanation/tutorial of llama2.c☆207Updated 11 months ago
- Easy and Efficient Quantization for Transformers☆172Updated 2 months ago
- Inference Llama/Llama2 Modes in NumPy☆18Updated 9 months ago
- llama3.cuda is a pure C/CUDA implementation for Llama 3 model.☆287Updated 3 months ago
- showing various ways to serve Keras based stable diffusion☆109Updated last year
- A collection of all available inference solutions for the LLMs☆65Updated 2 weeks ago
- Mixed precision training from scratch with Tensors and CUDA☆18Updated 4 months ago
- Newsletter bot for 🤗 Daily Papers☆101Updated this week
- This project shows how to serve an TF based image classification model as a web service with TFServing, Docker, and Kubernetes(GKE).☆120Updated 2 years ago
- The Universe of Evaluation. All about the evaluation for LLMs.☆205Updated 2 months ago
- manage histories of LLM applied applications☆86Updated 10 months ago
- 1-Click is all you need.☆58Updated 4 months ago
- Self-host LLMs with vLLM and BentoML☆62Updated this week
- Machine Learning Pipeline for Semantic Segmentation with TensorFlow Extended (TFX) and various GCP products☆93Updated last year
- ☆22Updated 8 months ago
- ☆170Updated this week
- End-to-End LLM Guide☆91Updated 2 months ago
- generate synthetic data for LLM fine-tuning in arbitrary situations within systematic way☆21Updated 6 months ago
- 삼각형의 실전! Triton☆14Updated 7 months ago
- This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM.☆281Updated 3 weeks ago
- Sakura-SOLAR-DPO: Merge, SFT, and DPO☆114Updated 8 months ago
- LLaMA 3 is one of the most promising open-source model after Mistral, we will recreate it's architecture in a simpler manner.☆82Updated 3 weeks ago
- Efficient fine-tuning for ko-llm models☆184Updated 6 months ago
- ☆97Updated 2 years ago
- An open-source implementaion for fine-tuning Phi3-Vision and Phi3.5-Vision by Microsoft.☆44Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆250Updated this week
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆129Updated last month
- AWS SageMaker를 이용한 MLOps와 LLMOps☆32Updated last year
- Notes on quantization in neural networks☆54Updated 9 months ago
- GPT2 fine-tuning pipeline with KerasNLP, TensorFlow, and TensorFlow Extended☆33Updated last year