kuleshov-group / caduceus
Bi-Directional Equivariant Long-Range DNA Sequence Modeling
☆160Updated last month
Related projects ⓘ
Alternatives and complementary repositories for caduceus
- Repository for StripedHyena, a state-of-the-art beyond Transformer architecture☆294Updated 8 months ago
- Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders☆46Updated this week
- Orthrus is a mature RNA model for RNA property prediction. It uses a mamba encoder backbone, a variant of state-space models specifical…☆37Updated 3 weeks ago
- ☆14Updated last month
- Benchmarking DNA Language Models on Biologically Meaningful Tasks☆96Updated 3 weeks ago
- My own attempt at a long context genomics model, leveraging recent advances in long context attention modeling (Flash Attention + other h…☆52Updated last year
- A Protein Large Language Model for Multi-Task Protein Language Processing☆139Updated last month
- AFusion: AlphaFold 3 GUI☆42Updated this week
- [NeurIPS 2023] Official codes of "MuSe-GNN: Learning Unified Gene Representation From Multimodal Biological Graph Data"☆26Updated 5 months ago
- BioDiscoveryAgent is an LLM-based AI agent for closed-loop design of genetic perturbation experiments☆24Updated this week
- Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax☆109Updated 3 years ago
- [ICML-23 ORAL] ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts☆88Updated last year
- Simplified Masked Diffusion Language Model☆207Updated last week
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆103Updated 3 months ago
- ☆57Updated 7 months ago
- Official implementation for HyenaDNA, a long-range genomic foundation model built with Hyena☆602Updated 5 months ago
- Implementation of Chroma, generative models of protein using DDPM and GNNs, in Pytorch☆158Updated last year
- A collection of awesome bio-foundation models, including protein, RNA, DNA, gene, single-cell, and so on.☆131Updated 2 weeks ago
- (Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens" (https://arxiv.org/abs/2307…☆51Updated last year
- [ICLR 2024] Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models☆252Updated 3 weeks ago
- ProtMamba: a homology-aware but alignment-free protein state space model☆49Updated 3 weeks ago
- Dirichlet Diffusion Score Model for Biological Sequence Generation.☆45Updated 6 months ago
- PEER Benchmark, appear at NeurIPS 2022 Dataset and Benchmark Track (https://arxiv.org/abs/2206.02096)☆82Updated last year
- RITA is a family of autoregressive protein models, developed by LightOn in collaboration with the OATML group at Oxford and the Debora Ma…☆93Updated last year
- Awesome list of papers that extend Mamba to various applications.☆128Updated 2 months ago
- RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching☆43Updated 3 months ago
- Multi-modal Molecule Structure-text Model for Text-based Editing and Retrieval, Nat Mach Intell 2023 (https://www.nature.com/articles/s42…☆208Updated 3 months ago
- A repository with exploration into using transformers to predict DNA ↔ transcription factor binding☆81Updated 2 years ago
- The official code for "TaxDiff: Taxonomic-Guided Diffusion Model for Protein Sequence Generation"☆55Updated 2 months ago
- Implementation of the Equiformer, SE3/E3 equivariant attention network that reaches new SOTA, and adopted for use by EquiFold for protein…☆245Updated this week