Lyken17 / hf-torrent
☆37Updated last year
Alternatives and similar repositories for hf-torrent:
Users that are interested in hf-torrent are comparing it to the libraries listed below
- ☆30Updated 10 months ago
- Official repository for ICML 2024 paper "MoRe Fine-Tuning with 10x Fewer Parameters"☆17Updated 2 weeks ago
- differentiable top-k operator☆21Updated 3 months ago
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆23Updated last year
- implementation of dualformer☆13Updated 3 weeks ago
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"☆80Updated this week
- Description and applications of OpenAI's paper about DALL-E (2021) and implementation of other (CLIP-guided) zero-shot text-to-image gene…☆32Updated 2 years ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆46Updated last year
- Here we will test various linear attention designs.☆60Updated 11 months ago
- Using FlexAttention to compute attention with different masking patterns☆42Updated 6 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆41Updated last month
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆41Updated 9 months ago
- Paper List for In-context Learning 🌷☆20Updated 2 years ago
- On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability☆38Updated 2 months ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated 9 months ago
- A tiny, didactical implementation of LLAMA 3☆35Updated 3 months ago
- Open-Pandora: On-the-fly Control Video Generation☆32Updated 4 months ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆39Updated last year
- [NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models☆40Updated 3 weeks ago
- ☆37Updated last year
- 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.☆18Updated 11 months ago
- ☆18Updated 10 months ago
- [ICLR 2025] Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better☆14Updated last month
- Interface for GenAI-Arena☆13Updated last year
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Updated 5 months ago
- Triton implement of bi-directional (non-causal) linear attention☆44Updated last month
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆44Updated 8 months ago
- Experimental scripts for researching data adaptive learning rate scheduling.☆23Updated last year
- ☆17Updated last year