Data defines the model by dint of genetic programming, producing the best decile table.


Profile Analysis of Any Regression-based Model
Bruce Ratner, PhD
Live chat by Boldchat
Live chat by Boldchat


t1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mean_X
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
t3
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
t31
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
coefficients
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
t4

Profile Analysis of Any Regression-based Model addresses:
Q1. How do the predictor variables play in the model?
I.e., What are the values of the predictor variables that contribute
to a company's score?

Q2. Which predictor variables are most important, least important?
I.e., What is the ordering of the predictor variables in terms of
their contribution to a company's score?



Profile Analysis Algorithm
1. Append the score variable of the given regression model
to the training dataset used in building the regression model.
The training dataset has 
     a. company ID variable Comp_ID 
     b. response variable Resp 
     c. predictor variables x = (X1, …, Xk) 
     d. score variable Score.
     See Table 1.

2. Calculate the means of the predictor variables for 
    all observations in the training dataset. 
    Let the means: mean_x = (mean_X1, …, mean_Xk). 
    See Table 2.
 
3. For a given company, determine its corresponding Score value. 
    For example, for company B433 Score = 0.55.

4. Subset the training data such that Score = 0.55. 
    Let the subset: data_sc55.

5. Calculate the mean of Resp for data_sc55. 
    Let the mean: mean_Resp. 
    Mean_Resp is a reestimate (fine-tuning) of Score.
    Compare mean_Resp = .6 and Score = 0.55.

6. Calculate index = (x - mean_x) / mean_x, for data_sc55. 

7. index (deX1, ... , deXk) answers Q1. 
    For B433:
    The raw profile values (of X1, X2, X3) are 37, 140, 12, in Table 1.
    The index profile values are (32.0%), (40.0%), (27.3%), in Table 3.1. 
    The sign profile symbols are -, --, and ---, in Table 3.1.
    
    Sign symbols -/+,  --/++, and ---/+++ represent
    raw profile values that are p% less/greater than the corresponding mean_x, 
    where p lies in one the following intervals: 
    (0, 33.3%), [33.3%, 66.7%), and [66.7%, ...]. 


8. Regress Resp on X1, … , Xk. 
    Obtain the standardized regression coefficients. 
    Let the standardized coefficient coef: (zb1, …, zbk). 

    If the finished regression model has highly correlated predictors, 
    therefore, rendering the model quesetionable, then go to step 8a, below.

8a. Calculate the resistant partial correlation coefficient (RPCC): 
     r(Resp, Xi | X1, X2, ..., Xi-1, Xi+1, ..., Xk), i = 1,2, ... i-1, i+1, ..., k, 
     for the region of overlapping interquartile ranges   
     of
X1, X2, ..., Xi-1, Xi+1, ..., Xk. 

     Next, index the RPCCs, in the manner of step 9 to yield the sought-after 
     importance weights (see step 10) of the predictor variables.
    
9. Normalize coef: pct_coef: abs_coef / sum_coef, where 
    abs_coef = ( abs(zb1), … , abs(zbk) ), and 
    sum_coef = abs(zb1) + … + abs(zbk). 
    See Table 4.
 
10. pct_coef (pct_X1, ... , pct_Xk) answers Q2.
        For B433: 
      The importance of the predictors are
14.2%, 18.6%, 67.2%, in Table 4.
      Note: Coef is the same for Comp_IDs within a score group (e.g., Score = 0.55). 
    

For more information about this article, call Bruce Ratner at 516.791.3544 or 1 800 DM STAT-1; or e-mail at br@dmstat1.com.
Sign-up for a free GenIQ webcast: Click here.