Implementation of the dilated self attention as described in "LongNet: Scaling Transformers to 1,000,000,000 Tokens"
ā13Jul 23, 2023Updated 2 years ago
Alternatives and similar repositories for dilated-self-attention
Users that are interested in dilated-self-attention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A RAG that can scale š§š»āš»ā11May 28, 2024Updated 2 years ago
- Model implementation for the contextual embeddings projectā47Jun 2, 2025Updated 11 months ago
- ā10Oct 2, 2024Updated last year
- š¤ Trade any tensors over the networkā31Sep 27, 2023Updated 2 years ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.ā32Sep 19, 2025Updated 8 months ago
- Deploy on Railway without the complexity - Free Credits Offer ⢠AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- š¤ HuggingFace Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but for Vertex AI and unofficial)ā17Mar 20, 2024Updated 2 years ago
- Sparse Embedding Compression for Scalable Retrieval in Recommender Systemsā35Nov 21, 2025Updated 6 months ago
- Rust derive macros for automating the boring stuff.ā14Aug 3, 2025Updated 9 months ago
- A missing piece of the Python multitask (both threads and processes) API: An extension that supports stateful worker pools & size-aware iā¦ā29Mar 8, 2026Updated 2 months ago
- Starbucks: Improved Training for 2D Matryoshka Embeddingsā23Jun 30, 2025Updated 11 months ago
- A Python module for retrieving script types of writing systems including alphabets, abjads, abugidas, syllabaries, logographs, featurals ā¦ā15Jul 19, 2024Updated last year
- [NeurIPS 2024] šø GlotCC Dataset and Piplineā20Apr 6, 2025Updated last year
- A proposed standard `NOCK` for a Parquet format that supports efficient distributed serialization of multiple kinds of graph technologiesā21Apr 27, 2026Updated last month
- Investigation into whether Transformers and self-supervised learning could be used to trade currency marketsā10Jun 21, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer ⢠AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ā15Oct 31, 2023Updated 2 years ago
- ā11Dec 26, 2018Updated 7 years ago
- Developing, training, and assessing the performance of a Proximal Policy Optimization (PPO) Stock Trading Agent.ā14Aug 20, 2025Updated 9 months ago
- [CoLM 24] Official Repository of MambaByte: Token-free Selective State Space Modelā27Oct 12, 2024Updated last year
- Almost SOTA LLM architecture, with O(n) time complexityā11Jan 19, 2025Updated last year
- A transient UI for Cargo, Rust's package managerā11Dec 17, 2025Updated 5 months ago
- ā13Aug 10, 2024Updated last year
- recreation of the classic drug trading game "dope wars"ā10May 9, 2019Updated 7 years ago
- A small spreadsheet demo in Rust, Yew, and WASMā11Jun 16, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient ⢠AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Implementation of plug in and play Attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens"ā720Jan 7, 2024Updated 2 years ago
- Life before `main()`ā19Feb 2, 2021Updated 5 years ago
- SPRINT Toolkit helps you evaluate diverse neural sparse models easily using a single click on any IR dataset.ā48Jul 25, 2023Updated 2 years ago
- Parsing and serialization support for PSSH boxes used in DRM systemsā15Apr 6, 2026Updated last month
- Library for evaluating RAG using Nuclia's modelsā18Jul 31, 2024Updated last year
- Smart commit messagesā18Oct 25, 2024Updated last year
- Babel plugin for Regex+ā14Dec 16, 2025Updated 5 months ago
- Efficient kernel for RMS normalization with fused operations, includes both forward and backward passes, compatibility with PyTorch.ā13Jun 5, 2024Updated last year
- The source code for the official documentation of PyScript.ā16Mar 2, 2026Updated 2 months ago
- GPU virtual machines on DigitalOcean Gradient AI ⢠AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- š Modular retrievers for zero-shot multilingual IR.ā30Mar 6, 2024Updated 2 years ago
- Hunt the Wumpus in Yewā11Oct 26, 2023Updated 2 years ago
- Rank-DistiLLM: Closing the Effectiveness Gap Between Cross-Encoders and LLMs for Passage Re-Rankingā25Apr 4, 2025Updated last year
- A collection of reusable, high-performance, well-documented, thorough-tested layers and models in Jaxā24Jun 8, 2025Updated 11 months ago
- Let's you build repositories and archives of repositories.ā29Nov 20, 2014Updated 11 years ago
- ā27Feb 26, 2026Updated 3 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"ā101Sep 30, 2024Updated last year