Fast approximate inference on a single GPU with sparsity aware offloading
☆39Jan 4, 2024Updated 2 years ago
Alternatives and similar repositories for tricksy
Users that are interested in tricksy are comparing it to the libraries listed below
Sorting:
- .Net5 microservices example for cryptocurrencies price prediction using ML.NET, Vue, SignalR and massTransit☆11May 18, 2021Updated 4 years ago
- Karpathy's llama2.c transpiled to MLX for Apple Silicon☆14Dec 28, 2023Updated 2 years ago
- Lightweight server for developing conversational agents using Microsoft AutoGen 0.4☆25Mar 20, 2025Updated 11 months ago
- This is our own implementation of 'Layer Selective Rank Reduction'☆240May 26, 2024Updated last year
- Code for the paper "Function-Space Learning Rates"☆25Jun 3, 2025Updated 9 months ago
- Just a bunch of benchmark logs for different LLMs☆119Jul 28, 2024Updated last year
- Collection of autoregressive model implementation☆85Feb 23, 2026Updated last week
- QLoRA for Masked Language Modeling☆23Sep 11, 2023Updated 2 years ago
- The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction☆390Jul 9, 2024Updated last year
- Chrome Extension for YouTube. Acts as an assistant for the YouTube video you are watching☆23Apr 26, 2023Updated 2 years ago
- ☆27Dec 13, 2024Updated last year
- ☆32Jan 1, 2024Updated 2 years ago
- Implementation of the Mamba SSM with hf_integration.☆55Aug 31, 2024Updated last year
- ☆53Jan 18, 2024Updated 2 years ago
- ☆35Feb 10, 2025Updated last year
- ☆596Aug 23, 2024Updated last year
- [NAACL 2025 Main Selected Oral] Repository for the paper: Prompt Compression for Large Language Models: A Survey☆36May 18, 2025Updated 9 months ago
- Gradio Client in Rust.☆28Nov 30, 2025Updated 3 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆35Apr 17, 2025Updated 10 months ago
- Stop messing around with finicky sampling parameters and just use DRµGS!☆360Jun 1, 2024Updated last year
- ☆32Nov 11, 2024Updated last year
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]☆589Dec 9, 2024Updated last year
- Official implementation of the ICLR 2024 paper AffineQuant☆28Mar 30, 2024Updated last year
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆32Nov 4, 2024Updated last year
- Modification of daveshap/ChromaDB_Chatbot_Public that allows for end-users to customize the behavior/memories of the chatbot☆13Jun 30, 2023Updated 2 years ago
- PyTorch implementation of experiments in the paper Aligning Language Models with Human Preferences via a Bayesian Approach☆32Nov 6, 2023Updated 2 years ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"☆397Feb 24, 2024Updated 2 years ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆119Jan 7, 2024Updated 2 years ago
- ☆30Feb 16, 2024Updated 2 years ago
- ☆31Nov 8, 2023Updated 2 years ago
- A Human-LLM Collaborative Dataset for Generative Information-seeking with Attribution☆35Aug 2, 2023Updated 2 years ago
- Supervised instruction finetuning for LLM with HF trainer and Deepspeed☆36Jul 6, 2023Updated 2 years ago
- [NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective☆40Oct 17, 2023Updated 2 years ago
- code for "Automated and Intelligent Synthesis of Oxygen-Producing Catalysts from Martian Meteorites by Robotic AI-Chemist "☆12Jul 31, 2023Updated 2 years ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆84Nov 27, 2024Updated last year
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI☆222Apr 29, 2024Updated last year
- ☆11Jul 3, 2024Updated last year
- A repository for log-time feedforward networks☆224Apr 9, 2024Updated last year
- Multi-Layer Key-Value sharing experiments on Pythia models☆34Jun 14, 2024Updated last year