sanowl / Drag-and-Drop-LLMs-Zero-Shot-Prompt-to-WeightsLinks
☆30Updated 4 months ago
Alternatives and similar repositories for Drag-and-Drop-LLMs-Zero-Shot-Prompt-to-Weights
Users that are interested in Drag-and-Drop-LLMs-Zero-Shot-Prompt-to-Weights are comparing it to the libraries listed below
Sorting:
- ☆55Updated 11 months ago
- Easy to use, High Performant Knowledge Distillation for LLMs☆95Updated 6 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆98Updated 5 months ago
- PyTorch implementation of Titans.☆27Updated 9 months ago
- Sparse Inferencing for transformer based LLMs☆201Updated 3 months ago
- A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size☆77Updated 2 months ago
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆108Updated 11 months ago
- A pipeline parallel training script for LLMs.☆162Updated 6 months ago
- A large-scale RWKV v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy…☆45Updated 3 weeks ago
- Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…☆59Updated last year
- [ACL 2025] How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training☆45Updated 3 months ago
- Official code repository for Sketch-of-Thought (SoT)☆129Updated 6 months ago
- This repository contains the code for the paper: SirLLM: Streaming Infinite Retentive LLM☆60Updated last year
- RWKV-7: Surpassing GPT☆100Updated 11 months ago
- BlackGoose Rimer: RWKV as a Superior Architecture for Large-Scale Time Series Modeling☆29Updated 4 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆101Updated last year
- Official Repository for "Glyph: Scaling Context Windows via Visual-Text Compression"☆478Updated last week
- ☆39Updated 6 months ago
- ☆34Updated 2 months ago
- ☆51Updated last year
- PyTorch implementation of models from the Zamba2 series.☆185Updated 9 months ago
- One Line To Build Zero-Data Classifiers in Minutes☆62Updated last year
- [ICLR'25] ApolloMoE: Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts☆51Updated 11 months ago
- [NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)☆146Updated 2 weeks ago
- B-Llama3o a llama3 with Vision Audio and Audio understanding as well as text and Audio and Animation Data output.☆26Updated last year
- This is the official Python version of Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play.☆99Updated 3 weeks ago
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆102Updated 10 months ago
- FuseAI Project☆87Updated 9 months ago
- The code repository for the CURLoRA research paper. Stable LLM continual fine-tuning and catastrophic forgetting mitigation.☆52Updated last year
- Code and data for the Chain-of-Draft (CoD) paper☆335Updated 8 months ago