Plug-and-play Search Interfaces with Pyserini and Hugging Face
☆32Aug 5, 2023Updated 2 years ago
Alternatives and similar repositories for hf-spacerini
Users that are interested in hf-spacerini are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for embedding and retrieval research.☆16Oct 24, 2023Updated 2 years ago
- ☆12Apr 25, 2022Updated 3 years ago
- Toolkit for domain-specific information retrieval experimentation☆19Feb 24, 2026Updated last month
- 🤖 Code for our EMNLP 2022 paper: "BotsTalk: Machine-sourced Framework for Automatic Curation of Large-scale Multi-skill Dialogue Dataset…☆16Oct 7, 2024Updated last year
- QLoRA for Masked Language Modeling☆23Sep 11, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- FlexiTokens☆19Dec 27, 2025Updated 3 months ago
- The official implementation of the paper "Text Classification in the Wild: a Large-scale Long-tailed Name Normalization Dataset"(ICASSP 2…☆12Feb 19, 2023Updated 3 years ago
- stoplists for African languages generated from the ASP corpus☆14Jan 16, 2016Updated 10 years ago
- T5Patches is a set of tools for fast and targeted editing of generative language models built with T5X.☆12May 31, 2024Updated last year
- Python Module implementing SRP☆12Jul 29, 2022Updated 3 years ago
- Code for EMNLP 2021 paper: Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting☆17Nov 30, 2021Updated 4 years ago
- The official implementation of the EMNLP 2023 paper "Paraphrase Types for Generation and Detection"☆12Oct 20, 2024Updated last year
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆97Feb 9, 2023Updated 3 years ago
- Convenient Text-to-Text Training for Transformers☆19Dec 10, 2021Updated 4 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ …☆69Jan 7, 2026Updated 2 months ago
- Code for our SIGIR 2022 accepted paper : P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-based L…☆18Sep 24, 2023Updated 2 years ago
- Topic Model based on Pretrained Sentence Embeddings (with BERT)☆13Feb 8, 2023Updated 3 years ago
- Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings☆15May 3, 2023Updated 2 years ago
- Implementation of the SOTA Transformer architecture from PaLM - Scaling Language Modeling with Pathways in JAX/Flax☆14Jun 22, 2022Updated 3 years ago
- A blog where I write about research papers and blog posts I read.☆12Nov 20, 2024Updated last year
- Text generation using language models with multiple exit heads☆16Sep 18, 2025Updated 6 months ago
- The source code and the data for ACL 2022 paper "Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Dat…☆14Apr 21, 2023Updated 2 years ago
- Self-Supervised Document-to-Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference☆44Nov 28, 2022Updated 3 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- ML tools that we use internally and which you may find useful too.☆26Apr 27, 2022Updated 3 years ago
- Auxiliary tasks for task-oriented dialogue systems. Published in ICNLSP'22 and indexed in the ACL Anthology.☆17Feb 27, 2023Updated 3 years ago
- “Generate to Understand for Representation”☆14Apr 18, 2024Updated last year
- Official implementation of "Data Mixture Inference: What do BPE tokenizers reveal about their training data?"☆18May 15, 2025Updated 10 months ago
- Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings (EMNLP 2022 paper)☆76Dec 29, 2025Updated 3 months ago
- DImensionality REduction in JAX☆26Nov 21, 2025Updated 4 months ago
- A tool for udacity mentors to analyze the feedback they receive from their students.☆14Jul 10, 2022Updated 3 years ago
- Röttger et al. (2024): "IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance"☆16Mar 6, 2026Updated 3 weeks ago
- ☆23Oct 30, 2023Updated 2 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- [NeurIPS 2025] MergeBench: A Benchmark for Merging Domain-Specialized LLMs☆44Feb 11, 2026Updated last month
- ☆97Aug 6, 2022Updated 3 years ago
- This repository houses materials consulted by the instructors☆12Jan 8, 2022Updated 4 years ago
- https://footprints.baulab.info☆18Oct 4, 2024Updated last year
- Libraries, Archives and Museums (LAM)☆88Oct 4, 2022Updated 3 years ago
- NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking☆13Sep 10, 2021Updated 4 years ago
- LTeX+ Language Server support for Zed☆17Dec 1, 2025Updated 3 months ago