Gleghorn-Lab / Mixture-of-Experts-Sentence-SimilarityLinks
☆15Updated 7 months ago
Alternatives and similar repositories for Mixture-of-Experts-Sentence-Similarity
Users that are interested in Mixture-of-Experts-Sentence-Similarity are comparing it to the libraries listed below
Sorting:
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆28Updated this week
- [COLM 2024] Early Weight Averaging meets High Learning Rates for LLM Pre-training☆17Updated last year
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆32Updated last year
- Pre-trained Language Model for Scientific Text☆46Updated last year
- Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆39Updated 11 months ago
- The open source implementation of "Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers"☆18Updated last year
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆56Updated this week
- Few-shot Learning with Auxiliary Data☆31Updated last year
- Official implementation of "BERTs are Generative In-Context Learners"☆32Updated 7 months ago
- Minimum Description Length probing for neural network representations☆20Updated 8 months ago
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆55Updated 8 months ago
- Codebase for Instruction Following without Instruction Tuning☆36Updated last year
- [AAAI 2025] Augmenting Math Word Problems via Iterative Question Composing (https://arxiv.org/abs/2401.09003)☆20Updated 2 weeks ago
- QRHead: Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking☆24Updated last month
- BioCoder: A Benchmark for Bioinformatics Code Generation with Large Language Models https://arxiv.org/abs/2308.16458☆51Updated 2 months ago
- Embedding Recycling for Language models☆38Updated 2 years ago
- ☆56Updated last year
- ☆15Updated last year
- Interpretable unified language safety checking with large language models☆31Updated 2 years ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆59Updated last year
- Adding new tasks to T0 without catastrophic forgetting☆33Updated 2 years ago
- Unofficial PyTorch implementation of "Step-unrolled Denoising Autoencoders for Text Generation"☆24Updated 2 years ago
- Pile Deduplication Code☆19Updated 2 years ago
- ☆22Updated 2 months ago
- Transformers at any scale☆41Updated last year
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆55Updated 2 years ago
- About The corresponding code from our paper " Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning…☆12Updated last year
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆32Updated 2 years ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆48Updated last year
- ☆35Updated last year