Censored Regression Model V#26 English

Показать описание

Main objective of this post is to explain the concept of __Tobit Model__ also called __Censored Regression Model__ which is used to find a relationship of a censored continuous dependent with other variables. Variable is called censored (right or left) when cases with a value at or above some threshold value take threshold value while actually it might also be higher. In left censoring (censoring from below). values which fall at or below some threshold are censored. In right censoring (censoring from above), values which fall at or above some threshold are censored. For details in R visit

Truncation and censoring are two distinct phenomena that cause our samples to be incomplete. These phenomena arise in medical sciences, engineering, social sciences, and other research fields. If we ignore truncation or censoring when analyzing our data, our estimates of population parameters will be inconsistent.
In the censored regression model, there are data on buyers and nonbuyers, as there would
be if the data were obtained via simple random sampling of the adult population. If, however,
the data are collected from sales tax records, then the data would include only buyers:
There would be no data at all for nonbuyers. Data in which observations are unavailable
above or below a threshold (data for buyers only) are called truncated data. The truncated
regression model is a regression model applied to data in which observations are simply
unavailable when the dependent variable is above or below a certain cutoff.([Introduction to Econometrics by Stock and Watson Ch.11]

Censoring or truncation happens during sampling process. For example when we measure income of households per month and we record all values above Rs.200,000/ as 200,000/ it means that we have data on all X-variables but data on response variable is censored above. In truncated data, no data on any of the variable with having income value above Rs.200,000/ will be available. So censored data sample is representative of populaton with certain values not recorded exactly while truncated data is not a representative sample.

## Example of Tobit Analysis

1) There are a number of customers in a mall (buyers and non-buyers). In censored data , non-buyers value will be counted as zero while buyers cosumption will be observed. In truncated data only buyers data will be in the sample.
2)In students evaluation, their CGPA 4 means that if a student scores above a certain % of marks, he/she gets 4 but this 4 does not measure exact scores of these students. So there is high concentration of values at GPA, so data are right censored.
3) Consider the situation in which we have a measure of academic aptitude (scaled 200-800) which we want to model using reading and math test scores, as well as, the type of program the student is enrolled in (academic, general, or vocational). The problem here is that students who answer all questions on the academic aptitude test correctly receive a score of 800, even though it is likely that these students are not “truly” equal in aptitude. For details
Apply OLS on censored or truncated data gives misleading results. For Censored data, we use Censored(Tobit) regression model and for truncated data we use truncated regression.

As mentioned above Censored data include a large number of observations for which the dependent variable takes one, or a limited number of values. An example is the mroz data, where about 43 percent of the women observed are not in the labour force, therefore their market hours worked are zero.