Model Monitoring with WeightWatcher: Data-Free DEEP LEARNING Diagnostics

Weighwatcher is an open-source Model Monitoring tool that provides Data-Free Diagnostics for production-quality Deep Neural Networks (DNNs). It. can tell you if your model is over-trained or over-parameterized. And it can it tell you which layers are over-trained or under-trained (over-parameterized). And all without needing training or test data.

Let’s see how to do this. First, install the tool.

pip install weightwatcher

Second, pick a model and get a basic description of it

import weightwatcher as ww
watcher = ww.WeightWatcher(model=my_model)
details = watcher.analyze()

WeightWatcher produces a pandas dataframe, details, with layer metrics describing your model. In particular, the details dataframe contains layer names, ids, types, and warnings.

Here is an example, an analysis of the OpenAI GPT model (discussed in our recent Nature paper)

import transformers
from transformers import OpenAIGPTModel,GPT2Model

gpt_model = OpenAIGPTModel.from_pretrained('openai-gpt')

watcher = ww.WeightWatcher(model=gpt_model)
details = watcher.analyze()

The details dataframe now includes a wide range of information, for each layer, including a specific warnings columns

weightwatcher layer warnings for GPT

For comparison, below we show the same details deatframe, but for GPT2. GPT2 is the same model as GPT, but trained with more and better data, Being a much better model, GPT2 has far fewer warnings than GPT.

weightwatcher layer warnings for GPT2

That’s all there is to it. WeightWatcher provides simple layer metrics for pre-trained Deep Neural Networks, indicating simple warnings for which layers are over-trained and which layers are under-trained.

Give it a try. We are looking for early adopters needing better, faster, and cheaper AI monitoring. if it is useful to you , please let me know.


  1. Hi Charles, noob question but does it also work for non deep learning models? Or can we tweak it to work for other models? Thanks


    1. No, this is specifically for large scale, production quality Deep Learning models, such as the latest transformer models.


        1. No I do not. This is a completely new theory, with no foundation in the traditional machine learning literature. For the non-DL models, It is something I will need to work on.


          1. Thanks. Do you think this theory can be applied to normal tree based models? I will definitely read the paper as well but wanted to know your thoughts


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s