wdlctc/mini-s

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wdlctc/mini-s)

wdlctc / mini-s

☆51

Alternatives and similar repositories for mini-s

Users that are interested in mini-s are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

haileyschoelkopf / triton-index
View on GitHub
See https://github.com/cuda-mode/triton-index/ instead!
☆11May 8, 2024Updated 2 years ago
liushulinle / UloRL
View on GitHub
An Ultra-Long Output Reinforcement Learning Approach
☆23Jul 31, 2025Updated 11 months ago
linzhu123455 / spotify-skip-prediction-top-1-solution
View on GitHub
☆15Jan 11, 2019Updated 7 years ago
drarijitdas / Natural-GaLore
View on GitHub
An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace
☆19Oct 21, 2024Updated last year
zqOuO / GWT
View on GitHub
☆13May 4, 2026Updated 2 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
FranxYao / Long-Context-Data-Engineering
View on GitHub
Implementation of paper Data Engineering for Scaling Language Models to 128K Context
☆502Mar 19, 2024Updated 2 years ago
mukhal / GRACE
View on GitHub
[EMNLP '23] Discriminator-Guided Chain-of-Thought Reasoning
☆50Oct 11, 2024Updated last year
ScalingIntelligence / CATS
View on GitHub
☆33Nov 11, 2024Updated last year
ysy-phoenix / evalhub
View on GitHub
All-in-one benchmarking platform for evaluating LLM.
☆15Nov 12, 2025Updated 8 months ago
nanowell / Q-Sparse-LLM
View on GitHub
My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
☆37Aug 14, 2024Updated last year
mukhal / PromptRank
View on GitHub
[ACL 2023] Few-shot Reranking for Multi-hop QA via Language Model Prompting
☆27Oct 19, 2025Updated 9 months ago
trapoom555 / Language-Model-STS-CFT
View on GitHub
Improving Text Embedding of Language Models Using Contrastive Fine-tuning
☆64Aug 2, 2024Updated last year
KhoomeiK / complexity-scaling
View on GitHub
gzip Predicts Data-dependent Scaling Laws
☆35May 28, 2024Updated 2 years ago
Cerebras / DocChat
View on GitHub
GPT-4 Level Conversational QA Trained In a Few Hours
☆69Aug 21, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
rawsh / mirrorllm
View on GitHub
various experiments for scaling inference time compute with small reasoning models
☆17Jan 16, 2025Updated last year
VITA-Group / Q-GaLore
View on GitHub
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆206Jul 17, 2024Updated 2 years ago
Trustworthy-ML-Lab / ThinkEdit
View on GitHub
[EMNLP 25] An effective and interpretable weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study un…
☆19Dec 17, 2025Updated 7 months ago
jenni-ai / T2FW
View on GitHub
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
☆20Oct 9, 2022Updated 3 years ago
VITA-Group / WeLore
View on GitHub
[ICML 2025] From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories and Applications
☆52Oct 30, 2025Updated 8 months ago
CentML / Mist
View on GitHub
[EuroSys'25] Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization
☆24Apr 13, 2026Updated 3 months ago
xufangzhi / phi-Decoding
View on GitHub
[ACL 2025] An inference-time decoding strategy with adaptive foresight sampling
☆107May 18, 2025Updated last year
IST-DASLab / MicroAdam
View on GitHub
This repository contains code for the MicroAdam paper.
☆21Dec 14, 2024Updated last year
Leooyii / LCEG
View on GitHub
[COLM'25] A Controlled Study on Long Context Extension and Generalization in LLMs
☆65Mar 9, 2026Updated 4 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Gananath / NERD
View on GitHub
Evolution of Discrete data with Reinforcement Learning
☆13Dec 8, 2019Updated 6 years ago
ylsung / rsq
View on GitHub
Code for "RSQ: Learning from Important Tokens Leads to Better Quantized LLMs"
☆23Mar 25, 2026Updated 4 months ago
Red-Hat-AI-Innovation-Team / SQuat
View on GitHub
☆22Jun 5, 2025Updated last year
snu-mllab / Neural-Relation-Graph
View on GitHub
Official PyTorch implementation of "Neural Relation Graph: A Unified Framework for Identifying Label Noise and Outlier Data" (NeurIPS'23)
☆15Dec 4, 2023Updated 2 years ago
trusted-programming / mate
View on GitHub
☆14Mar 25, 2025Updated last year
uservan / ThinkPO
View on GitHub
☆17Aug 1, 2025Updated 11 months ago
Adam-Mazur / Lazy-Llama
View on GitHub
An implementation of LazyLLM token pruning for LLaMa 2 model family.
☆13Jan 6, 2025Updated last year
LuLuLuyi / LongHeads
View on GitHub
[EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor
☆32Apr 8, 2024Updated 2 years ago
pharaouk / dharma
View on GitHub
☆13Apr 25, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
nyunAI / PruneGPT
View on GitHub
☆51May 31, 2024Updated 2 years ago
keeeeenw / TinyLlama
View on GitHub
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
☆14Mar 30, 2024Updated 2 years ago
WowCZ / LongMIT
View on GitHub
LongMIT: Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets
☆43Sep 30, 2024Updated last year
Extrality / nvidia-dind
View on GitHub
docker:dind with NVIDIA GPU support via NVIDIA container toolkit
☆14Jul 1, 2026Updated 3 weeks ago
hkust-nlp / B-STaR
View on GitHub
B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
☆86May 21, 2025Updated last year
PayamDiba / CIMLA
View on GitHub
Counterfactual Inference by Machine Learning and Attribution Models
☆15Aug 24, 2023Updated 2 years ago
song-wx / SIFT
View on GitHub
[ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparsely
☆24Jun 26, 2024Updated 2 years ago