huggingface / gaia
Hugging Face and Pyserini interoperability
☆17Updated last year
Related projects: ⓘ
- Scripts supporting the development and serving the Roots Search Tool - https://hf.co/spaces/bigscience-data/roots-search☆10Updated last year
- ☆19Updated last year
- Minimum Description Length probing for neural network representations☆15Updated 11 months ago
- Efficiently computing & storing token n-grams from large corpora☆15Updated 2 weeks ago
- Local emulator for Hugging Face Inference Endpoints customer handlers☆24Updated last year
- URL downloader supporting checkpointing and continuous checksumming.☆19Updated 9 months ago
- A file utility for accessing both local and remote files through a unified interface.☆36Updated last month
- Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval☆14Updated 8 months ago
- T5Patches is a set of tools for fast and targeted editing of generative language models built with T5X.☆11Updated 3 months ago
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆32Updated last year
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆26Updated last year
- ☆25Updated 9 months ago
- Embedding Recycling for Language models☆38Updated last year
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆21Updated last week
- HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation [ACL 2023]☆13Updated last year
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆31Updated 3 months ago
- This repository contains the ToolSelect dataset which was used to fine-tune Llama-2 70B for tool selection.☆17Updated 6 months ago
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆27Updated this week
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆22Updated 5 months ago
- ☆16Updated 10 months ago
- Submission to the inverse scaling prize☆23Updated last year
- A Streamlit app to add structured tags to a dataset card☆22Updated 2 years ago
- A library for squeakily cleaning and filtering language datasets.☆45Updated last year
- Explain a black-box module in natural language.☆33Updated 3 weeks ago
- Code for the paper "Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots" (NAACL-HLT 2021)☆11Updated 2 years ago
- code for paper "Accessing higher dimensions for unsupervised word translation"☆19Updated last year
- ☆13Updated this week
- One stop shop for all things carp☆58Updated 2 years ago
- [ICML 24 NGSM workshop] Associative Recurrent Memory Transformer implementation and scripts for training and evaluating☆26Updated last week