marqo-ai / GCL
Generalised Contrastive Learning. This is a Repository for Google Shopping Dataset and Benchmarks followed by our novel fine-grained contrastive learning framework.
β59Updated 3 weeks ago
Alternatives and similar repositories for GCL:
Users that are interested in GCL are comparing it to the libraries listed below
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.β157Updated last year
- Official Repository for "Hypencoder: Hypernetworks for Information Retrieval"β24Updated 2 months ago
- NLP with Rust for Python π¦πβ62Updated 11 months ago
- Index of URLs to pdf files all over the internet and scriptsβ23Updated 2 years ago
- The largest multilingual image-text classification dataset. It contains fashion products.β72Updated last year
- Pre-train Static Word Embeddingsβ58Updated 3 weeks ago
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, impβ¦β174Updated 8 months ago
- Code for NeurIPS LLM Efficiency Challengeβ57Updated last year
- Set of scripts to finetune LLMsβ37Updated last year
- Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and teβ¦β42Updated last year
- β39Updated this week
- Late Interaction Models Training & Retrievalβ306Updated this week
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching oβ¦β131Updated 4 months ago
- Fast, High-Fidelity LLM Decoding with Regex Constraintsβ20Updated 9 months ago
- β58Updated last year
- π Modular retrievers for zero-shot multilingual IR.β27Updated last year
- Repository containing the SPIN experiments on the DIBT 10k ranked promptsβ24Updated last year
- minimal pytorch implementation of bm25 (with sparse tensors)β101Updated last year
- Optimus is a flexible and scalable framework built to train language models efficiently across diverse hardware configurations, includingβ¦β54Updated 3 weeks ago
- State-of-the-art embedding models fine-tuned for the ecommerce domain. +67% increase in evaluation metrics vs ViT-B-16-SigLIP.β34Updated 5 months ago
- FastFit β‘ When LLMs are Unfit Use FastFit β‘ Fast and Effective Text Classification with Many Classesβ199Updated last week
- Bi-encoder entity linking architectureβ44Updated 7 months ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.β93Updated 2 years ago
- π€ Disaggregators: Curated data labelers for in-depth analysis.β66Updated 2 years ago
- β123Updated 6 months ago
- Generalist and Lightweight Model for Text Classificationβ124Updated last week
- Supercharge huggingface transformers with model parallelism.β76Updated 7 months ago
- Chunk your text using gpt4o-mini more accuratelyβ44Updated 9 months ago
- Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).β80Updated last year
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open dataβ21Updated 9 months ago