Data defines the model by dint of genetic programming, producing the best decile table.


Interpreting Model Performance: Use the “Smart” Decile Analysis
Bruce Ratner, Ph.D.

Data analysts use the decile analysis – based on the scores of the response model at hand – for creating a solicitation list of the most likely individuals to obtain an advantage over a random selection of individuals. The decile analysis involves a brute (“dumb”) division of a database into ten equal-sized contiguous groups (deciles) without regard for the shape of the distribution of model scores. The assumption of this “dumb” decile analysis – individuals within a decile have equivalent model scores, which are different from the model scores of the above-and-below neighboring deciles – is not always tenable, as the distribution of model scores is not always "smooth" but often characterized by "clumps" or "gaps". Deciles with these characteristics lodge extreme response segments, which reflect what the model is doing and how to implement the model to obtain a greater advantage over a random selection. The purpose of this article is to present the correct approach to interpreting model performance via the "smart" decile analysis, which provides a division of a database taking into account the clumps and gaps, for identifying extreme response segments to aid in understanding what the response model is doing and how to best implement the response model.


Two Illustrations of Dumb and Smart Decile Analyses


Illustration #1 for a Response Model #1

How to Read the Smart Decile Analysis
The quasi N-tile analysis (smart decile analysis) is used to helplessly show that the dumb decile analysis is misleading in its display of model performance. Although, the quasi 10-tile analysis of Illustration #1, below, produces 10 divisions or tiles (which is not always the case; see Illustration #2, below), its display is not like the corresponding decile analysis. Therefrom, the quasi N-tile analysis shows that the decile analysis assumption – individuals within a given decile have equivalent model scores, reflecting equivalent likelihoods of responding – is not met, and therefore, the estimates from the dumb decile analysis are not honest. (Let’s not concern ourselves with rounding off individuals in the deciles for now: Who wants to discuss 919.5 individuals anyway?!)

I use the quasi 10-tile analysis to parse the Top decile to show that the model scores form three clusters of individuals, each with nonequivalent responsiveness. That is, the Top decile consists of individuals of three levels of response rates – 20.00%, 12.40%, and 9.84%. This is inferred as follows:

  1. The Top 1st tile consists of the 70 most responsive individuals (in the entire data file) with equivalent scores as identified by the quasi 10-tile analysis. Their “smart” estimated response rate is 20.00%. Although these individuals account for 0.76% of the data, they are an extreme response segment.
  2. The next most responsive individuals with equivalent scores as identified by the quasi 10-tile analysis are in the 2nd tile consisting of 613 individuals with a smart estimated response rate of 12.40%.
  3. The next-next most responsive individuals, who come from the remaining 236 (= 919 – 70 - 613) individuals at the bottom of the top decile, are identified by the quasi N-tile analysis. They are in the 3rd tile with a smart estimated response rate of 9.84%.
  4. Thus, the Top decile consists of individuals in three clusters with varying levels of response rates – 20.00%, 12.40%, and 9.84%.
  5. The 236 individuals genuinely belong with all of their counterparts in the 2nd decile, and with the 7.44% (685) of the most responsive individuals in the 3rd decile. Specifically, the 236 individuals from the bottom of the top decile, all 919 individuals in the 2nd decile, and the 685 individuals from the top of the 3rd decile all dwell in the 3rd tile as indicated by the quasi analysis. These 1840 (= 236 + 919 + 685) individuals have a smart estimated response rate of 9.84%.
  6. Accordingly, the quasi 10-tile analysis provides smart and honest estimates of CumLifts for the now-understood dumb/nominal decile analysis:
    1. For the Top decile with CumLift of 142: The smart estimate of CumLift is 151 for the 2nd tile, reflecting 7.43% depth-of-file. Because model scores are slightly clumped about the 10%-neighborhood, an exact 10% depth-of-file smart estimate cannot be obtained.
    2. For the top two deciles, a 20% depth-of-file with a CumLift of 129: Because model scores are heavily clumped (have a very small variance) about the 20%-neighborhood, no smart estimate cannot be obtained.
    3. For the top three deciles, a 30% depth-of-file with a CumLift of 123: The smart estimate of CumLift is 123 for the 3rd tile, reflecting 27.44% depth-of-file. Because model scores are slightly clumped about the 30%-neighborhood an exact 10% depth-of-file smart estimate cannot be obtained.
aanother_10tiles



Illustration #2 for a Response Model #2

How to Read the Smart Decile Analysis
The quasi N-tile analysis (smart decile analysis) for Illustration #2 is read similarly to that in Illustration #1. But, note that the quasi 10-tile for Illustration #2 only has (showing) six tiles. The four "missing" tiles (2nd, 4th, 6th, and 8th) are actually suppressed as their model scores are nonobservable "gap" scores, and indicate there are no individuals in these tiles. Clearly, this smart decile analysis demonstrates that the estimates from the dumb decile analysis are not honest. 


ssmart_10tiles



Using 50-tiles
 
ssmart_50tiles


Using 100-tiles 

ssmart_100tiles

Using 200-tiles

ssmart_200tiles


For more information about this article, call Bruce Ratner at 516.791.3544 or 1 800 DM STAT-1; or e-mail at br@dmstat1.com.
Sign-up for a free GenIQ webcast: Click here.