sanyalsunny111/Early_Weight_Avg

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sanyalsunny111/Early_Weight_Avg)

sanyalsunny111 / Early_Weight_Avg

[COLM 2024] Early Weight Averaging meets High Learning Rates for LLM Pre-training

☆19

Alternatives and similar repositories for Early_Weight_Avg

Users that are interested in Early_Weight_Avg are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dsevero / Linear-Autoregressive-Similarity-Index
View on GitHub
Code for "The Unreasonable Effectiveness of Linear Prediction as a Perceptual Metric"
☆23Jan 26, 2024Updated 2 years ago
allenai / sso
View on GitHub
Repository for Skill Set Optimization
☆14Jul 26, 2024Updated 2 years ago
Aaquib111 / Sparse-GPT-Finetuning
View on GitHub
Code for my ICLR 2024 TinyPapers paper "Prune and Tune: Improving Efficient Pruning Techniques for Massive Language Models"
☆16May 26, 2023Updated 3 years ago
cyfer0618 / kaldi-pytorch-rnnlm
View on GitHub
Enable RNNLM lattice rescoring with Pytorch [kaldi]
☆12Jun 5, 2020Updated 6 years ago
dmis-lab / ChroKnowledge
View on GitHub
[ICLR 2025] ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains
☆17Mar 4, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
affjljoo3581 / polyglot-jax-inference
View on GitHub
TPU에서 한국어용 LLM 추론을 위한 Jax/Flax 구현체입니다.
☆12Jun 12, 2023Updated 3 years ago
affjljoo3581 / starcoder-jax
View on GitHub
a Jax/Flax inference code of StarCoder
☆12Jun 12, 2023Updated 3 years ago
readme-generator / alreadyme-ai-serving
View on GitHub
Serving large language model with transformers
☆13Oct 18, 2022Updated 3 years ago
coverist / coverist-android
View on GitHub
커버리스트 - 북 커버 생성 AI 서비스
☆13Sep 11, 2022Updated 3 years ago
affjljoo3581 / G2Net-Detecting-Continuous-Gravitational-Waves
View on GitHub
🥈12th place solution on G2Net Detecting Continuous Gravitational Waves🥈
☆14Jan 4, 2023Updated 3 years ago
Avmb / inverse_scaling_prize_code_identifier_swap
View on GitHub
Submission to the inverse scaling prize
☆23Jul 23, 2023Updated 3 years ago
affjljoo3581 / Google-American-Sign-Language-Fingerspelling-Recognition
View on GitHub
🎖️ 5th place solution in the Google American Sign Language Fingerspelling Recognition Competition🎖️
☆16Sep 19, 2023Updated 2 years ago
siwooyong / LG-AI-Challenge-for-Plant-Classification
View on GitHub
🥇 LG-AI-Challenge 2022 1위 솔루션 입니다.
☆13Jun 6, 2023Updated 3 years ago
thu-coai / MiniPLM
View on GitHub
[ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models
☆79Nov 23, 2024Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
NVlabs / STL
View on GitHub
Official Pytorch Implementation of Self-emerging Token Labeling
☆35Mar 27, 2024Updated 2 years ago
affjljoo3581 / deit3-jax
View on GitHub
Jax/Flax implementation of DeiT and DeiT-III (ViT)
☆19Dec 21, 2024Updated last year
zafstojano / wordgamebench
View on GitHub
Evaluating language models on word puzzle games
☆10Oct 25, 2024Updated last year
Westlake-AI / SEMA
View on GitHub
Switch EMA: A Free Lunch for Better Flatness and Sharpness
☆28Feb 16, 2024Updated 2 years ago
CyberAgentAILab / filtered-dpo
View on GitHub
[EMNLP 2024] Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by …
☆16Nov 27, 2024Updated last year
lsjsj92 / fast-api-tutorial
View on GitHub
fast api with machine learning
☆11Apr 23, 2023Updated 3 years ago
victorup / CHAE
View on GitHub
CHAE: Fine-Grained Controllable Story Generation with Characters, Actions and Emotions
☆11Jan 31, 2023Updated 3 years ago
gurkirt / preprocess-activityNet
View on GitHub
Preprocess the activityNet dataset for detection task
☆13Mar 3, 2017Updated 9 years ago
AK391 / PaintTransformer
View on GitHub
PyTorch implementation of paper: Paint Transformer: Feed Forward Neural Painting with Stroke Prediction, ICCV 2021.
☆27Aug 10, 2021Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ShivamGaurUQ / Automated-hashtag-generation-using-Deep-Learning
View on GitHub
☆26Dec 8, 2022Updated 3 years ago
xmos / vocalfusion-avs-setup
View on GitHub
Repository containing scripts/helpers for configuring a Raspberry Pi to work with XMOS mic frontend
☆14Jul 31, 2023Updated 2 years ago
detail-novelist / novelist-triton-server
View on GitHub
Deploy KoGPT with Triton Inference Server
☆14Nov 18, 2022Updated 3 years ago
readme-generator / alreadyme-ai-research
View on GitHub
Generate README.md with GPT-3 few-shot learning
☆26Oct 19, 2022Updated 3 years ago
soumik12345 / AODNet
View on GitHub
Tensorflow implementation of An All-in-One Network for Dehazing and Beyond
☆10Feb 22, 2021Updated 5 years ago
hunsii / LawBot
View on GitHub
LLM을 활용한 대화형 유사 판례 검색 시스템입니다.
☆27Jul 3, 2023Updated 3 years ago
leolle / atec_nlp
View on GitHub
蚂蚁金融自然语言处理竞赛。
☆10Sep 3, 2018Updated 7 years ago
dogancan / expected-edit-distance
View on GitHub
Expected edit distance implementation using OpenFst tools
☆11May 13, 2015Updated 11 years ago
yjyoon-dev / ssoda-flutter
View on GitHub
SNS Hashtag Offilne Event Managing Platform
☆12Feb 11, 2022Updated 4 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
ellenmellon / CGRG
View on GitHub
A Controllable Model of Grounded Response Generation (AAAI 21)
☆13Oct 25, 2022Updated 3 years ago
Blazedengcy / GTASR
View on GitHub
ICML 2026 - Joint Geometric and Trajectory Consistency Learning for One-Step Real-World Super-Resolution (GTASR)
☆19Jun 15, 2026Updated last month
jackbandy / bookcorpus-datasheet
View on GitHub
Documentation effort for the BookCorpus dataset
☆34Jun 2, 2021Updated 5 years ago
AIKU-Official / 2023S-AIKU-D2D
View on GitHub
2023 여름방학 고려대학교 AIKU 주니어 딥러닝 부트캠프: DeepintoDeep (D2D)
☆13Dec 14, 2023Updated 2 years ago
junkangwu / beta-DPO
View on GitHub
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
☆51Oct 23, 2024Updated last year
evanatyourservice / llm-jax
View on GitHub
Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.
☆19Jul 24, 2025Updated last year
eth-medical-ai-lab / smmile
View on GitHub
[NeurIPS Datasets & Benchmarks 2025] SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning
☆15Dec 2, 2025Updated 7 months ago