ashvardanian / jaccard-index
Optimizing bit-level Jaccard Index and Population Counts for large-scale quantized Vector Search via Harley-Seal CSA and Lookup Tables
☆18Updated this week
Alternatives and similar repositories for jaccard-index
Users that are interested in jaccard-index are comparing it to the libraries listed below
Sorting:
- utilities for loading and running text embeddings with onnx☆44Updated 9 months ago
- Using modal.com to process FineWeb-edu data☆20Updated last month
- QLoRA for Masked Language Modeling☆22Updated last year
- ☆48Updated last year
- Latent Large Language Models☆18Updated 8 months ago
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆30Updated 7 months ago
- A library for squeakily cleaning and filtering language datasets.☆47Updated last year
- ☆24Updated last year
- Chat Markup Language conversation library☆55Updated last year
- ☆43Updated 3 months ago
- ☆35Updated 2 years ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆53Updated 3 months ago
- ☆20Updated last year
- This repository implements DSPy programs to tasks in Indian Languages☆13Updated last year
- Tree-based indexes for neural-search☆31Updated last year
- 🤗 HuggingFace Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but for Vertex AI and unofficial)☆17Updated last year
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆48Updated last week
- ☆39Updated 2 years ago
- Gzip and nearest neighbors for text classification☆57Updated last year
- Pre-train Static Word Embeddings☆60Updated last month
- PyTorch implementation for MRL☆18Updated last year
- This repository contains code for cleaning your training data of benchmark data to help combat data snooping.☆25Updated 2 years ago
- ☆22Updated last year
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆24Updated last year
- ☆20Updated last week
- Using short models to classify long texts☆21Updated 2 years ago
- Embedding Recycling for Language models☆38Updated last year
- Experiments for efforts to train a new and improved t5☆77Updated last year
- Lite weight wrapper for the independent implementation of SPLADE++ models for search & retrieval pipelines. Models and Library created by…☆31Updated 8 months ago
- GPU accelerated client-side embeddings for vector search, RAG etc.☆66Updated last year