P: the true model (the probability density function that the data are drawn from) and. Q:the approximating model(s) ie. the candidate models. The AIC is derived from the KL-divergence. It tries to select for the model Q that minimizes the KL-divergence. Because the P is fixed (ie. it is dependant on actual data) the AIC is derived only from the component of the KL-divergence relating to q (ie. integral of -p(x) log q(x)) As the KL divergence can’t be less than zero, maximizing this will minimize the KL-divergence. Taken from Wikipedia page ‘Kullback-Liebler Divergence.