Bulletin of Computational Applied Mathematics (Bull CompAMa)
Design of an imputation methodology by random selection using regression trees
Lelly Useche; Jean Perez Parra; Carlos Garcia-Mendoza; Ana Ides Chacon
One of the biggest issues in the information collection stage is the absence of data, this research focuses specifically on the scenario when the loss is partial, completely random and the data is quantitative. There are classic techniques to impute data, however, these have not been able to accurately impute the real data. A design of an imputation methodology by random selection is proposed through the use of regression trees, comparing theoretically and empirically with and without the use of the tree for different data loss percentages. Unbiased estimators of variances and biases are obtained by evaluating their properties, which improves the estimates. As a disadvantage of the proposed design, it does not solve the alteration of the distribution of the data and the relationship between the variables.
Keywords: Absence of data; imputation; regression trees; random selection.