AnswerDotAI / fastkmeans
β35Updated 2 weeks ago
Alternatives and similar repositories for fastkmeans:
Users that are interested in fastkmeans are comparing it to the libraries listed below
- β43Updated 2 months ago
- NLP with Rust for Python π¦πβ62Updated 11 months ago
- Pre-train Static Word Embeddingsβ58Updated 3 weeks ago
- β19Updated this week
- Trully flash implementation of DeBERTa disentangled attention mechanism.β46Updated 3 weeks ago
- An introduction to LLM Samplingβ77Updated 4 months ago
- π€ Trade any tensors over the networkβ30Updated last year
- β49Updated 2 months ago
- Training code for Sparse Autoencoders on Embedding modelsβ38Updated 2 months ago
- Lightweight tools for quick and easy LLM demo'sβ26Updated 7 months ago
- Repository containing the SPIN experiments on the DIBT 10k ranked promptsβ24Updated last year
- minimal pytorch implementation of bm25 (with sparse tensors)β101Updated last year
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optunaβ39Updated 3 months ago
- QLoRA for Masked Language Modelingβ22Updated last year
- β9Updated 6 months ago
- β54Updated 8 months ago
- β48Updated 5 months ago
- Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and teβ¦β42Updated last year
- Fast, High-Fidelity LLM Decoding with Regex Constraintsβ20Updated 9 months ago
- Improving Text Embedding of Language Models Using Contrastive Fine-tuningβ64Updated 9 months ago
- Crispy reranking models by Mixedbreadβ26Updated this week
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.β34Updated 4 months ago
- β28Updated 5 months ago
- This repository contains code for cleaning your training data of benchmark data to help combat data snooping.β25Updated 2 years ago
- Small python package to measure OCR quality and other related metrics.β21Updated last year
- Code, datasets, and checkpoints for the paper "CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval anβ¦β29Updated 7 months ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning Pβ¦β34Updated last year
- Analysis on the cost of encoder based modelsβ11Updated 2 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignmentβ55Updated 8 months ago
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.β70Updated this week