ziegler-ingo / CRAFT
Code, datasets, and checkpoints for the paper "CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation"
☆26Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for CRAFT
- ☆41Updated 2 weeks ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆61Updated 4 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆27Updated 4 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated 8 months ago
- This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.☆48Updated 2 months ago
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆20Updated 9 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆46Updated 2 months ago
- ☆19Updated last month
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆68Updated last month
- ☆28Updated 5 months ago
- Dataset Viber is your chill repo for data collection, annotation and vibe checks.☆44Updated 2 months ago
- PyTorch implementation for MRL☆18Updated 9 months ago
- ☆41Updated 2 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimization☆41Updated 8 months ago
- Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).☆77Updated 8 months ago
- ☆24Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆72Updated 2 months ago
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆37Updated 7 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆39Updated last month
- Versatile framework designed to streamline the integration of your models, as well as those sourced from Hugging Face, into complex progr…☆23Updated 3 months ago
- DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆35Updated last month
- ☆45Updated 2 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆30Updated 9 months ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆44Updated last year
- Data preparation code for CrystalCoder 7B LLM☆42Updated 6 months ago
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆23Updated 8 months ago
- ☆59Updated last month
- ☆56Updated 9 months ago
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆21Updated last week