YuchuanTian/RethinkTinyLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/YuchuanTian/RethinkTinyLM)

YuchuanTian / RethinkTinyLM

[ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”

☆126

Alternatives and similar repositories for RethinkTinyLM

Users that are interested in RethinkTinyLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ggjy / vision_weak_to_strong
View on GitHub
☆38Feb 8, 2024Updated 2 years ago
YuchuanTian / DiJiang
View on GitHub
[ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…
☆103Jun 14, 2024Updated 2 years ago
iamhankai / Full-Stack-Filters
View on GitHub
Pytorch code for paper: Full-Stack Filters to Build Minimum Viable CNNs
☆16Sep 10, 2019Updated 6 years ago
WailordHe / DenseSSM
View on GitHub
A repository for DenseSSMs
☆90Apr 11, 2024Updated 2 years ago
InternLM / InternEvo
View on GitHub
InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencie…
☆421Aug 21, 2025Updated 11 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
princeton-nlp / LLM-Shearing
View on GitHub
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
☆643Mar 4, 2024Updated 2 years ago
SparksJoe / Prism
View on GitHub
A Framework for Decoupling and Assessing the Capabilities of VLMs
☆44Jun 28, 2024Updated 2 years ago
hkust-nlp / llm-compression-intelligence
View on GitHub
Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]
☆150Sep 20, 2024Updated last year
HeegyuKim / ko-rm-judge
View on GitHub
Reward Model을 이용하여 언어모델의 답변을 평가하기
☆30Feb 23, 2024Updated 2 years ago
Gaffey / ExCP
View on GitHub
Official implementation of ICML 2024 paper "ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking".
☆48Jul 12, 2024Updated 2 years ago
xhan77 / in-context-alignment
View on GitHub
In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning
☆34Aug 9, 2023Updated 2 years ago
facebookresearch / spartan
View on GitHub
Spartan is an algorithm for training sparse neural network models. This repository accompanies the paper "Spartan Differentiable Sparsity…
☆26Oct 31, 2022Updated 3 years ago
HazyResearch / prefix-linear-attention
View on GitHub
☆62Jul 9, 2024Updated 2 years ago
r-three / RAD
View on GitHub
Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
☆45Oct 1, 2025Updated 9 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
lose4578 / CircleRoPE
View on GitHub
☆15Sep 1, 2025Updated 10 months ago
SkyworkAI / Skywork-MoE
View on GitHub
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
☆140Jun 12, 2024Updated 2 years ago
ServiceNow / promptmix-emnlp-2023
View on GitHub
Offical code repository for PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation, EMNLP 2023
☆12Dec 13, 2023Updated 2 years ago
ezelikman / justonebyte
View on GitHub
☆10Jun 19, 2023Updated 3 years ago
BBuf / flash-rwkv
View on GitHub
☆32May 26, 2024Updated 2 years ago
mbzuai-oryx / MobiLlama
View on GitHub
[ICLR-2025-SLLM Spotlight 🔥]MobiLlama : Small Language Model tailored for edge devices
☆668May 10, 2025Updated last year
MILVLG / imp
View on GitHub
a family of highly capabale yet efficient large multimodal models
☆194Aug 23, 2024Updated last year
open-compass / MathBench
View on GitHub
[ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset
☆116May 22, 2025Updated last year
shootime2021 / APUS-xDAN-4.0-moe
View on GitHub
Its an open source LLM based on MOE Structure.
☆58Jul 2, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
FreedomIntelligence / ALLaVA
View on GitHub
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆281Jun 25, 2024Updated 2 years ago
Scientific-Computing-Lab / MPI-rigen
View on GitHub
MPI Code Generation through Domain-Specific Language Models
☆16Nov 19, 2024Updated last year
myshell-ai / JetMoE
View on GitHub
Reaching LLaMA2 Performance with 0.1M Dollars
☆985Jul 23, 2024Updated 2 years ago
opengear-project / GEAR
View on GitHub
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
☆184Jul 12, 2024Updated 2 years ago
HKUNLP / ChunkLlama
View on GitHub
[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"
☆451Oct 16, 2024Updated last year
hao-ai-lab / LookaheadDecoding
View on GitHub
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
☆1,339Mar 6, 2025Updated last year
zhaohui-yang / Binary-Neural-Networks
View on GitHub
Binary neural networks developed by Huawei Noah's Ark Lab
☆29Feb 19, 2021Updated 5 years ago
pangu-tech / pangu-ultra
View on GitHub
☆76May 30, 2025Updated last year
yale-nlp / refdpo
View on GitHub
☆16Jul 23, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
xmoanvaf / llava-phi
View on GitHub
☆400Dec 12, 2024Updated last year
tiagofrepereira2012 / examples.tensorflow
View on GitHub
Code examples to learn how to use tensorflow
☆15Aug 25, 2016Updated 9 years ago
J-Seo / KoCommonGEN-V2
View on GitHub
KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models
☆25Aug 24, 2024Updated last year
XuezheMax / megalodon
View on GitHub
Reference implementation of Megalodon 7B model
☆526May 17, 2025Updated last year
BAAI-DCAI / Bunny
View on GitHub
A family of lightweight multimodal models.
☆1,053Nov 18, 2024Updated last year
princeton-nlp / ELIZA-Transformer
View on GitHub
[NAACL 2025] Representing Rule-based Chatbots with Transformers
☆23Feb 9, 2025Updated last year
deep-diver / hllama
View on GitHub
hllama is a library which aims to provide a set of utility tools for large language models.
☆10Apr 16, 2024Updated 2 years ago