Code for the paper "Fishing for Magikarp"
☆181Feb 25, 2026Updated last week
Alternatives and similar repositories for magikarp
Users that are interested in magikarp are comparing it to the libraries listed below
Sorting:
- General research for Dreadnode☆27Jun 17, 2024Updated last year
- HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation [ACL 2023]☆14Jul 11, 2023Updated 2 years ago
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆70Feb 22, 2024Updated 2 years ago
- BPE modification that implements removing of the intermediate tokens during tokenizer training.☆26Nov 25, 2024Updated last year
- Arxiv + Notion Sync☆20May 12, 2025Updated 9 months ago
- ☆15Jun 7, 2024Updated last year
- Thorn in a HaizeStack test for evaluating long-context adversarial robustness.☆26Aug 3, 2024Updated last year
- ☆44Jun 19, 2024Updated last year
- ☆44Dec 28, 2022Updated 3 years ago
- The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".☆69Oct 23, 2024Updated last year
- ☆23Dec 18, 2024Updated last year
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"☆28Oct 3, 2021Updated 4 years ago
- Efficient Transformers with Dynamic Token Pooling☆67May 20, 2023Updated 2 years ago
- ☆16May 30, 2024Updated last year
- An unofficial implementation of the Infini-gram model proposed by Liu et al. (2024)☆33Jun 19, 2024Updated last year
- Official code repository of the paper Learning Associative Inference Using Fast Weight Memory by Schlag et al.☆29Feb 25, 2021Updated 5 years ago
- ☆18Apr 15, 2024Updated last year
- TACL 2025: Investigating Adversarial Trigger Transfer in Large Language Models☆19Aug 17, 2025Updated 6 months ago
- [COLM 2024] Early Weight Averaging meets High Learning Rates for LLM Pre-training☆19Oct 12, 2024Updated last year
- [𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…☆51May 4, 2024Updated last year
- Official code and model checkpoints for our EMNLP 2022 paper "RankGen - Improving Text Generation with Large Ranking Models" (https://arx…☆137Aug 2, 2023Updated 2 years ago
- Code for Zero-Shot Tokenizer Transfer☆143Jan 14, 2025Updated last year
- Contextualized per-token embeddings☆34May 11, 2025Updated 9 months ago
- ☆19Dec 4, 2025Updated 3 months ago
- ☆23Jan 27, 2025Updated last year
- ☆23Jan 17, 2025Updated last year
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Jun 3, 2024Updated last year
- Language models scale reliably with over-training and on downstream tasks☆100Apr 2, 2024Updated last year
- [EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"☆36Jun 7, 2025Updated 9 months ago
- Easily embed, cluster and semantically label text datasets☆596Mar 28, 2024Updated last year
- Representation Engineering: A Top-Down Approach to AI Transparency☆953Aug 14, 2024Updated last year
- This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Aji…☆242Nov 3, 2023Updated 2 years ago
- ☆26May 30, 2023Updated 2 years ago
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆23Aug 18, 2024Updated last year
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆96Feb 9, 2023Updated 3 years ago
- Remote code execution in Power Platform connectors via JSON deserialization☆23Mar 30, 2023Updated 2 years ago
- Multipack distributed sampler for fast padding-free training of LLMs☆206Aug 10, 2024Updated last year
- ☆15Oct 24, 2023Updated 2 years ago
- Implementation of BEAST adversarial attack for language models (ICML 2024)☆90May 14, 2024Updated last year