huggingface / roots-search-tool
Scripts supporting the development and serving the Roots Search Tool - https://hf.co/spaces/bigscience-data/roots-search
☆10Updated last year
Related projects ⓘ
Alternatives and complementary repositories for roots-search-tool
- Hugging Face and Pyserini interoperability☆19Updated last year
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆17Updated last month
- Minimum Description Length probing for neural network representations☆16Updated last week
- ☆20Updated last year
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆18Updated last year
- Highly specialized crate to parse and use `google/sentencepiece` 's precompiled_charsmap in `tokenizers`☆18Updated 2 years ago
- Neural Solr = Solr 9 + Mighty Inference + Node☆16Updated 2 years ago
- Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval☆14Updated 10 months ago
- Source code and data for Like a Good Nearest Neighbor☆28Updated 9 months ago
- ☆14Updated last year
- Code for "Incorporating Relevance Feedback for Information-Seeking Retrieval using Few-Shot Document Re-Ranking" (https://arxiv.org/abs/2…☆12Updated last year
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆32Updated last year
- Binary vector search example using Unum's USearch engine and pre-computed Wikipedia embeddings from Co:here and MixedBread☆19Updated 7 months ago
- Documentation effort for the BookCorpus dataset☆33Updated 3 years ago
- Efficiently computing & storing token n-grams from large corpora☆15Updated last month
- A file utility for accessing both local and remote files through a unified interface.☆36Updated 3 months ago
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆65Updated last year
- Embedding Recycling for Language models☆38Updated last year
- ☆11Updated 2 years ago
- ☆22Updated 2 years ago
- Code for SaGe subword tokenizer (EACL 2023)☆22Updated this week
- ☆12Updated 6 months ago
- ☆15Updated 3 months ago
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆22Updated last year
- This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…☆14Updated last year
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- A sample pattern for running CI tests on Modal☆13Updated 2 months ago
- code for paper "Accessing higher dimensions for unsupervised word translation"☆21Updated last year
- URL downloader supporting checkpointing and continuous checksumming.☆19Updated 11 months ago
- ☆18Updated 7 months ago