soedinglab / kClust
kClust is a fast and sensitive clustering method for the clustering of protein sequences. It is able to cluster large protein databases down to 20-30% sequence identity. kClust generates a clustering where each cluster is represented by its longest sequence (representative sequence).
☆17Updated 5 years ago
Alternatives and similar repositories for kClust:
Users that are interested in kClust are comparing it to the libraries listed below
- Protein structure alignment and search algorithm☆50Updated this week
- ☆14Updated 8 years ago
- Protein Sequence Annotation with Language Models☆19Updated 3 months ago
- Python framework for doing ancestral sequence reconstruction☆37Updated 6 months ago
- Cython bindings and Python interface to FAMSA, an algorithm for ultra-scale multiple sequence alignments.☆30Updated 3 weeks ago
- DeepSig - Predictor of signal peptides in proteins based on deep learning☆26Updated last year
- Clustering the NCBI nr database with mmseq2 (90% length, 90% identity). Inspired by the NCBI's experimental ClusteredNR database.☆23Updated last year
- MSA(Multiple Sequence Alignment) visualization python package for sequence analysis☆111Updated last month
- ☆11Updated last month
- The 3DFI pipeline predicts the 3D structure of proteins and searches for structural homology in the 3D space.☆19Updated 10 months ago
- Centroid RNA package☆19Updated 4 years ago
- Bacterial Annotation by Learned Representation of Genes☆54Updated 4 years ago
- Software for predicting translation initiation rates in bacteria☆20Updated 2 months ago
- Fast protein domain structure embedding+search tool☆11Updated 2 weeks ago
- Transmembrane proteins predicted through Language Model embeddings☆32Updated last week
- Detection of remote homology by comparison of protein language model representations☆47Updated last month
- A quick and easy way to download the genomes/predicted proteins of taxa available in JGI's Genome Portal.☆32Updated 4 months ago
- Conservation analysis of homologous proteins with Python☆10Updated 3 years ago
- Discovery of conserved gene clusters in multiple genomes☆58Updated 2 weeks ago
- Visualise RNA secondary structure in consistent, reproducible and recognisable layouts☆66Updated this week
- Automatic oligonucleotide design for PCR-based gene synthesis☆38Updated 5 years ago
- Maximum likelihood structural phylogenetics by including Foldseek 3Di characters. Supporting Information for Puente-Lelievre et al. 2023n…☆18Updated 7 months ago
- Python Implementation of Codon Adaption Index☆35Updated last year
- G4Hunter (2012_2015)- IECB - Bordeaux☆13Updated 4 years ago
- snakemake pipeline for creating trees from sequence sets☆70Updated last week
- ☆17Updated 4 years ago
- MeShClust2: Application of alignment-free identity scores in clustering long DNA sequences☆14Updated 3 years ago
- MMseqs2 app to run on your workstation or servers☆68Updated 2 weeks ago
- Universal and efficient core gene phylogeny with Foldseek and ProstT5☆46Updated last week
- scripts for predicting natural product activity from biosynthetic gene cluster sequences☆23Updated last year