zbambergerNLP / principled-pre-trainingView external linksLinks
A repository to get acquainted with basic training tasks in natural language processing and machine learning
☆11Dec 27, 2023Updated 2 years ago
Alternatives and similar repositories for principled-pre-training
Users that are interested in principled-pre-training are comparing it to the libraries listed below
Sorting:
- ☆10Oct 14, 2023Updated 2 years ago
- Arabic News Stance Corpus☆11Feb 5, 2021Updated 5 years ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆98Oct 3, 2025Updated 4 months ago
- Code Roberta version of RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder☆10Mar 16, 2023Updated 2 years ago
- ☆10May 1, 2025Updated 9 months ago
- The official implementation of the paper "Text Classification in the Wild: a Large-scale Long-tailed Name Normalization Dataset"(ICASSP 2…☆12Feb 19, 2023Updated 2 years ago
- Token-free Language Modeling with ByGPT5 & Friends!☆12Jul 18, 2025Updated 6 months ago
- A preliminary effort for https://github.com/fumin/ntm☆11Mar 1, 2015Updated 10 years ago
- Pytorch implementation of standard metrics for clustering☆10Mar 21, 2023Updated 2 years ago
- The contrastive token loss function for reducing generative repetition of autoregressive neural language models.☆13May 11, 2022Updated 3 years ago
- Code for "Inducer-tuning: Connecting Prefix-tuning and Adapter-tuning" (EMNLP 2022) and "Empowering Parameter-Efficient Transfer Learning…☆11Feb 6, 2023Updated 3 years ago
- This repository contains the implementation code for paper: Mixup Your Own Pairs☆12Oct 1, 2023Updated 2 years ago
- This repository provides the code for applying Contrastive Learning Penalty Loss (CLPL) and Mixture of Experts (MoE) to the BGE-M3 text e…☆11Dec 27, 2024Updated last year
- This project scrapes 0.8 Million data from stockanalysis.com along with all the filters and updates it on a Google Sheet every 15 minutes…☆20Nov 23, 2023Updated 2 years ago
- These are tools I cheated with the help of ChatGPT to help me with Penetration Testing and Red Teaming☆15Feb 24, 2024Updated last year
- Extract data from NEXRAD Doppler Radar NetCDFs☆12Jun 17, 2018Updated 7 years ago
- Classification of reserve risk with chain-ladder☆12Aug 31, 2019Updated 6 years ago
- This repository contains code used for our Multi Sentence Inference NAACL'22 paper.☆12Mar 6, 2023Updated 2 years ago
- ☆12Jan 21, 2019Updated 7 years ago
- A context-aware embedding similarity score☆11Aug 23, 2023Updated 2 years ago
- Provides an RStudio addin command to render the current Rmarkdown document in the console☆10Oct 1, 2025Updated 4 months ago
- Code for "Incorporating Relevance Feedback for Information-Seeking Retrieval using Few-Shot Document Re-Ranking" (https://arxiv.org/abs/2…☆14Feb 2, 2026Updated last week
- This repository contains the code for the EMNLP'23 paper "AdaSent: Efficient Domain-Adapted Sentence Embeddings for Few-Shot Classificati…☆16Jun 3, 2024Updated last year
- LDPC codes for Illumina sequencing-based DNA storage☆11Dec 2, 2020Updated 5 years ago
- T5Patches is a set of tools for fast and targeted editing of generative language models built with T5X.☆12May 31, 2024Updated last year
- Repository to create CCKGs from the paper "Similarity-weighted Construction of Contextualized Commonsense Knowledge Graphs for Knowledge-…☆11May 23, 2025Updated 8 months ago
- Partial code for "Skill Extraction from Job Postings using Weak Supervision" at RecSysHR 2022.☆13May 19, 2023Updated 2 years ago
- A minimal working example of using undetected-chromedriver on AWS Lambda with Selenium and Docker☆19Aug 12, 2025Updated 6 months ago
- ☆12Jul 6, 2023Updated 2 years ago
- Implementation of NAACL'25 "Empowering Retrieval-based Conversational Recommendation with Contrasting User Preferences"☆14Sep 9, 2025Updated 5 months ago
- ☆12Jan 2, 2024Updated 2 years ago
- MERLIN is a global, model-agnostic, contrastive explainer for any tabular or text classifier. It provides contrastive explanations of how…☆19Sep 15, 2023Updated 2 years ago
- Topic Model based on Pretrained Sentence Embeddings (with BERT)☆13Feb 8, 2023Updated 3 years ago
- ☆14Oct 17, 2023Updated 2 years ago
- A modular and extensible Python framework, designed to aid in the creation of high-quality, unbiased datasets to build robust models for …☆19Nov 4, 2025Updated 3 months ago
- Effective Unsupervised Domain Adaptation of Neural Rankers by Diversifying Synthetic Query Generation☆15Apr 23, 2025Updated 9 months ago
- For FFL Blog☆10Sep 24, 2015Updated 10 years ago
- Oral, poster and courses from meetings and conferences☆13Jul 9, 2020Updated 5 years ago
- Text generation using language models with multiple exit heads☆16Sep 18, 2025Updated 4 months ago