bluorion-com / ZClip
Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".
☆37Updated last week
Alternatives and similar repositories for ZClip:
Users that are interested in ZClip are comparing it to the libraries listed below
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆46Updated last month
- Official PyTorch Implementation for Paper "No More Adam: Learning Rate Scaling at Initialization is All You Need"☆51Updated 2 months ago
- Recaption large (Web)Datasets with vllm and save the artifacts.☆50Updated 4 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆28Updated last month
- Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya☆107Updated last month
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆25Updated 5 months ago
- Implementation of the proposed MaskBit from Bytedance AI☆75Updated 5 months ago
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆14Updated 10 months ago
- working implimention of deepseek MLA☆40Updated 3 months ago
- Repository for "TESS-2: A Large-Scale, Generalist Diffusion Language Model"☆33Updated last month
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆21Updated 8 months ago
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆33Updated 9 months ago
- MEXMA: Token-level objectives improve sentence representations☆40Updated 3 months ago
- ☆62Updated 8 months ago
- ☆44Updated last month
- ☆74Updated 6 months ago
- Focused on fast experimentation and simplicity☆71Updated 3 months ago
- Notebook and Scripts that showcase running quantized diffusion models on consumer GPUs☆38Updated 5 months ago
- Official implementation of ECCV24 paper: POA☆24Updated 8 months ago
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.☆68Updated this week
- A repository for research on medium sized language models.☆76Updated 10 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆95Updated last month
- Collection of autoregressive model implementation☆85Updated 2 months ago
- ☆47Updated 7 months ago
- Official repository for ICML 2024 paper "MoRe Fine-Tuning with 10x Fewer Parameters"☆17Updated last month
- This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"☆105Updated 4 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆50Updated 4 months ago
- Official Code for Paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation☆58Updated 2 weeks ago
- Implementation of a Light Recurrent Unit in Pytorch☆47Updated 6 months ago
- Implementation of the Mamba SSM with hf_integration.☆56Updated 7 months ago