Building LLaMA 4 MoE from Scratch
☆73Apr 15, 2025Updated last year
Alternatives and similar repositories for train-llama4
Users that are interested in train-llama4 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Straightforward, Step-by-Step Implementation of a Video Diffusion Model☆82Aug 18, 2025Updated 7 months ago
- Train a 29M parameter GPT from Scratch☆35Mar 4, 2025Updated last year
- to study xilinx fpga using Zybo Z7-20 board☆14Mar 13, 2024Updated 2 years ago
- ☆11Feb 3, 2025Updated last year
- eIDAS Italian node☆11May 24, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- MICRO 2024 Evaluation Artifact for FuseMax☆17Aug 26, 2024Updated last year
- A distilled DeepSeek-R1 variant built on Qwen2.5-32B, fine-tuned with curated data for enhanced performance and efficiency. <metadata> gp…☆16Mar 11, 2025Updated last year
- PyTorch implementation of GRPO.☆15Apr 21, 2025Updated 11 months ago
- ☆16Mar 18, 2025Updated last year
- Synthetic Data Generator for Machine Learning Pipelines☆33Sep 2, 2025Updated 7 months ago
- ☆13Dec 6, 2024Updated last year
- Geographical Graph Attention Networks: Spatial Deep Learning Models for Spatial Prediction and Exploratory Spatial Data Analysis☆18Jul 28, 2025Updated 8 months ago
- ☆14Jun 16, 2020Updated 5 years ago
- Custom triton kernels for training Karpathy's nanoGPT.☆19Oct 21, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- RuCLIP-SB (Russian Contrastive Language–Image Pretraining SWIN-BERT) is a multimodal model for obtaining images and text similarities and…☆14Jan 25, 2022Updated 4 years ago
- API for toxic text classification, utilized pre-trained Distilbert and trained on Kaggle datasets. It helps identify and handle toxic con…☆14Apr 30, 2024Updated last year
- Vietnamese Large Language Model (LLM) fine-tuned for the task of Question Answering within the medical and healthcare domain☆26Mar 1, 2024Updated 2 years ago
- 23 Components of the Claude Code Architecture☆55Apr 5, 2026Updated last week
- The Multimodal Model for Vietnamese Visual Question Answering (ViVQA)☆21Jul 29, 2024Updated last year
- Notes and commented code for RLHF (PPO)☆127Feb 27, 2024Updated 2 years ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- ☆12Dec 14, 2024Updated last year
- Image captioning using CNN and RNN☆11Mar 24, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆12Jun 2, 2024Updated last year
- ☆18Apr 9, 2025Updated last year
- Created a simple neural network using C++17 standard and the Eigen library that supports both forward and backward propagation.☆11Jul 27, 2024Updated last year
- Code from Chris Valasek @nudehaberdasher and Charlie Miller @0xcharlie car hack: http://blog.ioactive.com/2013/08/car-hacking-content.ht…☆15Oct 1, 2020Updated 5 years ago
- SpeechPlus: Small LLM-Based Text-to-Speech Library 🚀☆20May 20, 2025Updated 10 months ago
- A parser combinator in Ruby, with a pretty DSL☆11Jun 25, 2017Updated 8 years ago
- pytorch版基于gpt+nezha的中文多轮Cdial☆11Oct 22, 2022Updated 3 years ago
- A Step-by-Step Implementation of Google Veo 3 Architecture from Scratch☆83Jun 16, 2025Updated 9 months ago
- 数据库内核笔记☆13Aug 18, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Implementation for the PHM paper at ICLR'21☆13Mar 1, 2023Updated 3 years ago
- Langchain_CrewAI_Gemini - An Gemini AI powered AI Agent (Multi-Agent) Project.☆14Mar 24, 2024Updated 2 years ago
- A straightforward method to reduce your LLM inference API costs and token usage.☆22May 18, 2025Updated 10 months ago
- Artificial Intelligence Professional Program by Stanford School of Engineering☆19May 9, 2023Updated 2 years ago
- ☆27Jun 12, 2025Updated 10 months ago
- Load and run Llama from safetensors files in C☆15Oct 24, 2024Updated last year
- Complete Reinforcement Learning Toolkit for Large Language Models!☆21Aug 2, 2025Updated 8 months ago