To retain our flexible non-linear earnings-education profile whilst controlling for unobserved ability effects on earnings and returns to earnings, we adopt a two-stage control function approach.
Stage one: regress education on a set of instruments, and estimate the residual (denoted ).
Stage two: Estimate the earnings function using as a control variable for ability.
Our general empirical model is thus of the form
Why not 2SLS?
For linear models with constant slope coefficients, 2SLS and the control function estimator are equivalent.
But the control function approach is more robust than 2SLS when slope parameters co-vary with the unobserved factors of the model (Card, 2001).
And even if all slope parameters are constant, 2SLS is likely to result in relative imprecise parameter estimates since the model is non-linear in the endogenous variable. (For 2SLS we would have to estimate four first stage regressions, modelling each component of the spline function separately, and then use the predictions instead of the actual values in the second stage. A much richer instrument set would thus be required for 2SLS than for the control function estimator.)
To implement the control function estimator, we approximate and by third-order polynomials.
Instruments and exclusion restrictions
In the last wave of the data there is information on:
distance to primary school at the age of six,
distance to secondary school at the age of twelve,
parents’ education,
parents’ main occupations.
Distance to school is a supply side measure of education, so should be correlated with education and not with ability (Card, 2001).
Family background variables have been used as instruments for education in many previous studies, on the grounds that such variables should have no direct causal effects on earnings.
Selectivity Bias?
We have considered above a relatively conventional role of unobserved ability in potentially leading to bias. In Kenya and Tanzania, unlike more developed economies, having a job in the wage sector is atypical of outcomes in the labour market.
We do not have data on individuals outside the manufacturing sector, so we are limited in our ability to control for endogenous sample selection. In particular, we are unable to use a sample selection model along the lines proposed by Heckman (1976), since we cannot estimate a participation equation.
Can the control function address the sample selectivity problem? The answer is yes, provided the instruments are independent of the error term in the selected sample.
One example of a model in which this will apply is when the job selection model is of the form , (6)
where is an indicator variable equal to one if the individual has a manufacturing job and zero otherwise and is an unobserved factor which is potentially correlated with (i.e. the non-ability component of the error term in the earnings equation).
It is shown in the paper that this form of sample selectivity can lead to a downward bias in the return to education, if the selectivity mechanism is sufficiently strong, even though education and ability are positively correlated in the population.
But a well specified control function estimator will correct for this problem and give consistent estimates.
More general cases of sample selection can be more problematic, however.