huggingface / roots-search-tool

Scripts supporting the development and serving the Roots Search Tool - https://hf.co/spaces/bigscience-data/roots-search

☆10

Related projects ⓘ

Alternatives and complementary repositories for roots-search-tool

huggingface / gaia
Hugging Face and Pyserini interoperability
☆19Updated last year
ielab / Starbucks
Starbucks: Improved Training for 2D Matryoshka Embeddings
☆17Updated last month
EleutherAI / mdl
Minimum Description Length probing for neural network representations
☆16Updated last week
salesforce / simplification
☆20Updated last year
ngoyal2707 / Megatron-LM
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆18Updated last year
huggingface / spm_precompiled
Highly specialized crate to parse and use `google/sentencepiece` 's precompiled_charsmap in `tokenizers`
☆18Updated 2 years ago
maxdotio / neural-solr
Neural Solr = Solr 9 + Mighty Inference + Node
☆16Updated 2 years ago
stanfordnlp / multi-distribution-retrieval
Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval
☆14Updated 10 months ago
UKPLab / lagonn
Source code and data for Like a Good Nearest Neighbor
☆28Updated 9 months ago
huggingface / ethics-scripts
☆14Updated last year
UKPLab / incorporating-relevance
Code for "Incorporating Relevance Feedback for Information-Seeking Retrieval using Few-Shot Document Re-Ranking" (https://arxiv.org/abs/2…
☆12Updated last year
castorini / hf-spacerini
Plug-and-play Search Interfaces with Pyserini and Hugging Face
☆32Updated last year
ashvardanian / usearch-binary
Binary vector search example using Unum's USearch engine and pre-computed Wikipedia embeddings from Co:here and MixedBread
☆19Updated 7 months ago
jackbandy / bookcorpus-datasheet
Documentation effort for the BookCorpus dataset
☆33Updated 3 years ago
EleutherAI / tokengrams
Efficiently computing & storing token n-grams from large corpora
☆15Updated last month
allenai / cached_path
A file utility for accessing both local and remote files through a unified interface.
☆36Updated 3 months ago
huggingface / disaggregators
🤗 Disaggregators: Curated data labelers for in-depth analysis.
☆65Updated last year
allenai / EmbeddingRecycling
Embedding Recycling for Language models
☆38Updated last year
rycolab / probing-via-prompting
☆11Updated 2 years ago
MilaNLProc / language-invariant-properties
☆22Updated 2 years ago
MeLeLBGU / SaGe
Code for SaGe subword tokenizer (EACL 2023)
☆22Updated this week
aws-samples / evaluating-large-language-models-using-llm-as-a-judge
☆12Updated 6 months ago
castorini / LiT5
☆15Updated 3 months ago
stephantul / unitoken
Tokenization across languages. Useful as preprocessing for subword tokenization.
☆22Updated last year
benpry / chain-of-thought-metaphor
This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…
☆14Updated last year
facebookresearch / lss_eval
This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…
☆31Updated last year
modal-labs / ci-on-modal
A sample pattern for running CI tests on Modal
☆13Updated 2 months ago
facebookresearch / coocmap
code for paper "Accessing higher dimensions for unsupervised word translation"
☆21Updated last year
EleutherAI / best-download
URL downloader supporting checkpointing and continuous checksumming.
☆19Updated 11 months ago
srush / drop7
☆18Updated 7 months ago