IamHuijben / gumbel_softmax_samplingLinks

☆11

Alternatives and similar repositories for gumbel_softmax_sampling

Users that are interested in gumbel_softmax_sampling are comparing it to the libraries listed below

Sorting:

Noahs-ARK / PaLM
PyTorch implementation for PaLM: A Hybrid Parser and Language Model.
☆10Updated 5 years ago
machelreid / editpro
Learning to Model Editing Processes
☆26Updated 3 years ago
zomux / lanmt-ebm
lanmt ebm
☆12Updated 5 years ago
jenni-ai / T2FW
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
☆19Updated 2 years ago
robert-lieck / RBN
Recursive Bayesian Networks
☆11Updated 2 months ago
srush / mamba-scans
Blog post
☆17Updated last year
sustcsonglin / gated_linear_attention_layer
☆32Updated last year
frankxu2004 / knnlm-why
Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"
☆58Updated 2 years ago
harvardnlp / hmm-lm
☆41Updated 4 years ago
RobertCsordas / ndr
The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".
☆33Updated last month
srush / tangent
Source-to-Source Debuggable Derivatives in Pure Python
☆15Updated last year
FranxYao / RDP
Implementation of ICML 22 Paper: Scaling Structured Inference with Randomization
☆14Updated 2 years ago
acosharma / elita-transformer
Official Repository for Efficient Linear-Time Attention Transformers.
☆18Updated last year
microsoft / AMOS
[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators
☆24Updated last year
da03 / criticize_text_generation
A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …
☆11Updated 2 years ago
timvieira / vocrf
Variable-order CRFs with structure learning
☆16Updated 11 months ago
yikangshen / megablocks
☆20Updated last year
jungokasai / deep-shallow
☆44Updated 4 years ago
deep-spin / sparse-communication
☆12Updated 3 years ago
tt-embedding / tt-embeddings
☆27Updated 5 years ago
JunShern / few-shot-adaptation
Exploring Few-Shot Adaptation of Language Models with Tables
☆24Updated 2 years ago
ethancaballero / broken_neural_scaling_laws
Code Release for "Broken Neural Scaling Laws" (BNSL) paper
☆59Updated last year
yaohungt / TransformerDissection
[EMNLP'19] Summary for Transformer Understanding
☆53Updated 5 years ago
sustcsonglin / mamba-triton
☆48Updated last year
renll / SeqBoat
[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling
☆37Updated last year
HazyResearch / embroid
Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification
☆11Updated last year
belindal / TaskBench500
Suite of 500 procedurally-generated NLP tasks to study language model adaptability
☆21Updated 3 years ago
proger / nanokitchen
Parallel Associative Scan for Language Models
☆18Updated last year
bzhangGo / lrn
Source code for "A Lightweight Recurrent Network for Sequence Modeling"
☆26Updated 2 years ago
timvieira / dyna-pi
An interactive tool for analyzing, executing, and improving dynamic programming algorithms.
☆13Updated 11 months ago