joey00072 / Attention-as-graphLinks
alternative way to calculating self attention
☆18Updated last year
Alternatives and similar repositories for Attention-as-graph
Users that are interested in Attention-as-graph are comparing it to the libraries listed below
Sorting:
- ☆38Updated 10 months ago
- BH hackathon☆13Updated last year
- ☆9Updated last month
- Using multiple LLMs for ensemble Forecasting☆16Updated last year
- Lego for GRPO☆28Updated this week
- Using modal.com to process FineWeb-edu data☆20Updated last month
- Approximating the joint distribution of language models via MCTS☆21Updated 7 months ago
- The original BabyAGI, updated with LiteLLM and no vector database reliance (csv instead)☆21Updated 8 months ago
- An intelligent code optimization system leveraging AI analysis, automated refactoring, and test generation. Built with DSPy and Gradio, i…☆19Updated 4 months ago
- Flexible, efficient, and context-aware generation from large unstructured knowledge sources.☆16Updated last year
- Apps that run on modal.com☆12Updated last year
- ☆57Updated last week
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆53Updated 3 months ago
- look how they massacred my boy☆63Updated 7 months ago
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.☆32Updated 3 months ago
- Cerule - A Tiny Mighty Vision Model☆66Updated 8 months ago
- Training hybrid models for dummies.☆21Updated 4 months ago
- ☆25Updated 5 months ago
- Very minimal (and stateless) agent framework☆44Updated 4 months ago
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆22Updated 6 months ago
- Building large language foundational model☆9Updated 2 months ago
- The Swarm Ecosystem☆21Updated 10 months ago
- Simple repository for training small reasoning models☆31Updated 3 months ago
- Testing paligemma2 finetuning on reasoning dataset☆18Updated 5 months ago
- OmegaViT (ΩViT) is a cutting-edge vision transformer architecture that combines multi-query attention, rotary embeddings, state space mod…☆14Updated last week
- Repository to create traveling waves integrate special information through time☆52Updated 2 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated last year
- [WIP] Transformer to embed Danbooru labelsets☆13Updated last year
- ☆20Updated 3 weeks ago
- Project code for training LLMs to write better unit tests + code☆20Updated 2 weeks ago