Describing Double Descent with WeightWatcher

Double Descent (DD) is something that has surprised statisticians, computer scientists, and deep learning practitioners–but it was known in the … More

SVDSmoothing LLM Layers with WeightWatcher

Recently, Microsoft Research published the LASER method: ”Layer-Selective Rank Reduction” in this recent, very popular paper The Truth is in There: … More

Evaluating LLMs with WeightWatcher Part III: The Magic of Mistral, a Story of Dragon Kings

Recently, the Mistral models have taken the LLM world by storm. The Mistral Mixture of Experts (MOE) 8x7b model outperforms other … More

Evaluating Fine-Tuned LLMs with WeightWatcher Part II: PEFT / LoRa Models

Evaluating LLMs is hard. Especially when you don’t have a lot of test data.In the last post, we saw how to … More

Evaluating Fine-Tuned LLMs with WeightWatcher

if you are fine-tuning your own LLMs, you need a way to evaluate them. And while there are over a dozen … More

WeightWatcher new feature: fix_fingers=’clip_xmax’

WeightWatcher 0.7 has just been released, and it includes the new and improved advanced feature for analyzing Deep Neural Networks … More

WeightWatcher 0.7: March 2023

First, let me say thanks to all the users in our great community — we have reached over 93K downloads … More

Deep Learning and Effective Correlation Spaces

AI has taken the world by storm. With recent advances like AlphaFold, Stable Diffusion, and ChatGPT, Deep Neural Networks (DNNs) … More

Better than BERT: Pick your best model

Have you ever had to sort through HuggingFace to find your best model ? There are over 54,000 models on … More

Is your layer over-fit? (part 2)

Say you are training a Deep Neural Network (DNN), and you see your model is over-trained. Or just not performing … More