krypticmouse / matryoshka-representation-learning
PyTorch implementation for MRL
☆17Updated 6 months ago
Related projects: ⓘ
- Embedding Recycling for Language models☆38Updated last year
- ReBase: Training Task Experts through Retrieval Based Distillation☆27Updated 2 months ago
- Official implementation of paper "On the Diagram of Thought" (https://arxiv.org/abs/2409.10038)☆32Updated this week
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆29Updated 7 months ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆42Updated 10 months ago
- ☆29Updated 2 weeks ago
- A Retrieval Benchmark for Scientific Literature Search☆53Updated 2 months ago
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆32Updated last year
- ☆22Updated 3 months ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆26Updated 11 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆39Updated 2 weeks ago
- Improving Text Embedding of Language Models Using Contrastive Fine-tuning☆54Updated last month
- Minimum Description Length probing for neural network representations☆15Updated 11 months ago
- ☆18Updated this week
- a curated list of the role of small models in the LLM era☆16Updated this week
- Ranking of fine-tuned HF models as base models.☆35Updated last year
- Using short models to classify long texts☆20Updated last year
- [SIGIR 2024 (Demo)] CoSearchAgent: A Lightweight Collborative Search Agent with Large Language Models☆22Updated 7 months ago
- Code repo for "Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers" (ACL 2023)☆22Updated 10 months ago
- Few-shot Learning with Auxiliary Data☆26Updated 9 months ago
- ☆24Updated last year
- ☆17Updated 6 months ago
- Codebase accompanying the Summary of a Haystack paper.☆65Updated 2 months ago
- Finding semantically meaningful and accurate prompts.☆45Updated 10 months ago
- BPE modification that implements removing of the intermediate tokens during tokenizer training.☆13Updated last week
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆14Updated 6 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated 6 months ago
- ☆38Updated 4 months ago
- ☆13Updated last week
- official repo of AAAI2024 paper Mitigating the Impact of False Negatives in Dense Retrieval with Contrastive Confidence Regularization☆12Updated 8 months ago