WW-PGD: Projected Gradient Descent optimizer

Announcing: 𝗪𝗪-𝗣𝗚𝗗 — 𝗪𝗲𝗶𝗴𝗵𝘁𝗪𝗮𝘁𝗰𝗵𝗲𝗿 𝗣𝗿𝗼𝗷𝗲𝗰𝘁𝗲𝗱 𝗚𝗿𝗮𝗱𝗶𝗲𝗻𝘁 𝗗𝗲𝘀𝗰𝗲𝗻𝘁 🚀 I just released WW-PGD, a small PyTorch add-on that wraps standard optimizers … More

WeightWatcher, HTSR theory, and the Renormalization Group

There is a deep connection between the open-source weightwatcher tool, which implements ideas from the theory of Heavy Tailed Self-Regularization … More

Fine-Tuned Llama3.2: Bad Instructions ?

Recently, Meta released LLama3.2 1B and 3B Instruct Fine Tuned LLM. To mixed reviews. On the one hand, it’s ranking … More

What’s instructive about Instruct Fine-Tuning: a weightwatcher analysis

Are you Fine-Tuning an open-source LLMs ? Like Llama, Mistral, or Qwen? A That is, Instruct Fine Tuning. Whether you … More

Describing Double Descent with WeightWatcher

Double Descent (DD) is something that has surprised statisticians, computer scientists, and deep learning practitioners–but it was known in the … More

SVDSmoothing LLM Layers with WeightWatcher

Recently, Microsoft Research published the LASER method: ”Layer-Selective Rank Reduction” in this recent, very popular paper The Truth is in There: … More