Multiple Linear Regression Models in Outlier Detection

Download Full Text
Author(s):
S.M.A. Khaleelur Rahman, M. Mohamed Sathik, K. Senthamarai Kannan
Published Date:
February 29, 2012
Issue:
Volume 2, Issue 2
Page(s):
23 - 28
DOI:
10.7815/ijorcs.22.2012.018
Views:
4649
Downloads:
502

Keywords:
cut-value, cook’s d, dffits, multiple regression analysis, outlier detection
Citation:
S.M.A. Khaleelur Rahman, M. Mohamed Sathik, K. Senthamarai Kannan , "Multiple Linear Regression Models in Outlier Detection". International Journal of Research in Computer Science, 2 (2): pp. 23-28, February 2012. doi:10.7815/ijorcs.22.2012.018 Other Formats

Abstract

Identifying anomalous values in the real-world databases is important both for improving the quality of original data and for reducing the impact of anomalous values in the process of knowledge discovery in databases. Such anomalous values give useful information to the data analyst in discovering useful patterns. Through isolation, these data may be separated and analyzed. The analysis of outliers and influential points is an important step of the regression diagnostics. In this paper, our aim is to detect the points which are very different from the others points. They do not seem to belong to a particular population and behave differently. If these influential points are to be removed it will lead to a different model. Distinction between these points is not always obvious and clear. Hence several indicators are used for identifying and analyzing outliers. Existing methods of outlier detection are based on manual inspection of graphically represented data. In this paper, we present a new approach in automating the process of detecting and isolating outliers. Impact of anomalous values on the dataset has been established by using two indicators DFFITS and Cook’sD. The process is based on modeling the human perception of exceptional values by using multiple linear regression analysis.

  1. Hawkins. D. Identification of Outliers , Chapman and and Hall , London, 1980
  2. Barnett, V. and Lewis, T.: 1994, Outliers in Statistical Data. John Wiley & Sons., 3rd edition.
  3. Grubbs, F. E.: 1969, Procedures for detecting outlying observations in samples. Technometrics 11, 1–21. doi:10.1080/00401706.1969.10490657
  4. Rousseeuw, P. and Leroy, A.: 1996, Robust Regression and Outlier Detection. John Wiley & Sons., 3rd edition..
  5. Cook R. D and Weisberg S.T. (1982), Residuals and influence in New York Chapman and Hall.
  6. Abraham , B., and A.Chuang. “Outlier Detection and Time Series Modelling.” Technometrics(1989). doi:10.1080/00401706.1989.10488517
  7. Jiawei Han and Micheline Kamber “Data Mining concepts and Techniques” Elsever, Second Edition

  • Lee, Henry C., et al. "Accounting for variation in length of NICU stay for extremely low birth weight infants." Journal of Perinatology 33.11 (2013): 872-876.
  • Taraškevičius, Ričardas, et al. "Case Study of the Relationship between Aqua Regia and Real Total Contents of Harmful Trace Elements in Some European Soils." Journal of Chemistry 2013 (2012).
  • Freitas, Giuliano de Oliveira, et al. "Alpins and thibos vectorial astigmatism analyses: proposal of a linear regression model between methods." Revista Brasileira de Oftalmologia 72.5 (2013): 307-311.
  • Tappin, Ruth Maria. Personality traits, the interaction effects of education, and employee readiness for organizational change: A quantitative study. Diss. Capella University, 2014.