☆53Nov 3, 2024Updated last year
Alternatives and similar repositories for ml-hypercloning
Users that are interested in ml-hypercloning are comparing it to the libraries listed below
Sorting:
- JAX Scalify: end-to-end scaled arithmetics☆18Oct 30, 2024Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆187Jan 19, 2026Updated last month
- ☆71Oct 16, 2024Updated last year
- ☆24Dec 11, 2024Updated last year
- Memory-efficient transformer. Work in progress.☆19Sep 17, 2022Updated 3 years ago
- Official repo of dataset-decomposition paper [NeurIPS 2024]☆21Jan 8, 2025Updated last year
- ☆49Sep 26, 2025Updated 5 months ago
- [COLM 2025] "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"☆20Apr 9, 2025Updated 10 months ago
- Repo for the paper: PerAda: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees (CVPR 2024)☆23Aug 14, 2024Updated last year
- DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang☆44Nov 19, 2025Updated 3 months ago
- ☆116Updated this week
- ARLC, a probabilistic abductive reasoner for solving Raven's progressive matrices.☆21Sep 18, 2025Updated 5 months ago
- ☆19Oct 2, 2023Updated 2 years ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- Evaluate gpt-4o on CLIcK (Korean NLP Dataset)☆20May 18, 2024Updated last year
- Pipeline parallelism for the minimalist☆40Aug 6, 2025Updated 6 months ago
- Implementation of Diffusion Transformers and Rectified Flow in Jax☆27Jul 9, 2024Updated last year
- StrategyQA 데이터 세트 번역☆23Apr 12, 2024Updated last year
- 한국어 생성 모델의 상식 추론을 위한 KommonGen 데이터셋입니다.☆21Oct 5, 2021Updated 4 years ago
- ☆30Jul 18, 2024Updated last year
- ☆59Nov 18, 2025Updated 3 months ago
- Verifiers for LLM Reinforcement Learning☆80Apr 15, 2025Updated 10 months ago
- A hackable, simple, and reseach-friendly GRPO Training Framework with high speed weight synchronization in a multinode environment.☆37Aug 27, 2025Updated 6 months ago
- Checkpointable dataset utilities for foundation model training☆32Jan 29, 2024Updated 2 years ago
- Implementation for the paper: CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference☆35Mar 6, 2025Updated 11 months ago
- KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models☆25Aug 24, 2024Updated last year
- Moondream MCP Server in Python☆44Jul 2, 2025Updated 8 months ago
- Distributed Optimization Infra for learning CLIP models☆27Oct 3, 2024Updated last year
- #Paired Question☆24Jun 16, 2020Updated 5 years ago
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆32May 25, 2024Updated last year
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆29Jul 30, 2020Updated 5 years ago
- [EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor☆31Apr 8, 2024Updated last year
- Bias, Hate classification with KoELECTRA 👿☆27Jun 12, 2023Updated 2 years ago
- ☆31Jan 23, 2026Updated last month
- Dataset of Korean Threatening Conversations☆72Nov 1, 2022Updated 3 years ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆35Aug 15, 2023Updated 2 years ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆92Oct 30, 2024Updated last year
- ☆34May 14, 2025Updated 9 months ago
- Official repository for KoMT-Bench built by LG AI Research☆71Aug 8, 2024Updated last year