MingLiiii / Layer_Gradient

What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
40Updated 3 weeks ago

Related projects

Alternatives and complementary repositories for Layer_Gradient