Announcing: ๐ช๐ช-๐ฃ๐๐ โ ๐ช๐ฒ๐ถ๐ด๐ต๐๐ช๐ฎ๐๐ฐ๐ต๐ฒ๐ฟ ๐ฃ๐ฟ๐ผ๐ท๐ฒ๐ฐ๐๐ฒ๐ฑ ๐๐ฟ๐ฎ๐ฑ๐ถ๐ฒ๐ป๐ ๐๐ฒ๐๐ฐ๐ฒ๐ป๐ ๐ I just released WW-PGD, a small PyTorch add-on that wraps standard optimizers … More
Category: Uncategorized
WeightWatcher, HTSR theory, and the Renormalization Group
There is a deep connection between the open-source weightwatcher tool, which implements ideas from the theory of Heavy Tailed Self-Regularization … More
Fine-Tuned Llama3.2: Bad Instructions ?
Recently, Meta released LLama3.2 1B and 3B Instruct Fine Tuned LLM. To mixed reviews. On the one hand, it’s ranking … More
What’s instructive about Instruct Fine-Tuning: a weightwatcher analysis
Are you Fine-Tuning an open-source LLMs ? Like Llama, Mistral, or Qwen? A That is, Instruct Fine Tuning. Whether you … More
Describing Double Descent with WeightWatcher
Double Descent (DD) is something that has surprised statisticians, computer scientists, and deep learning practitioners–but it was known in the … More
SVDSmoothing LLM Layers with WeightWatcher
Recently, Microsoft Research published the LASER method:โ”Layer-Selective Rank Reduction” in this recent, very popular paper The Truth is in There: … More
Evaluating LLMs with WeightWatcher Part III: The Magic of Mistral, a Story of Dragon Kings
Recently, the Mistral models have taken the LLM world by storm. The Mistral Mixture of Experts (MOE)โ8x7b model outperforms other … More
Evaluating Fine-Tuned LLMs with WeightWatcher Part II: PEFT / LoRa Models
Evaluating LLMs is hard.โEspecially when you don’t have a lot of test data.In the last post, we saw how to … More
Evaluating Fine-Tuned LLMs with WeightWatcher
if you are fine-tuning your own LLMs, you need a way to evaluate them.โAnd while there are over a dozen … More
WeightWatcher new feature: fix_fingers=’clip_xmax’
WeightWatcher 0.7 has just been released, and it includes the new and improved advanced feature for analyzing Deep Neural Networks … More
