The application of Mahalanobis distance to the delineation of multivariate outliers in the East Kunlun Mountains
GENG Guo-Shuai1,2(), YANG Fan3,4()
1. School of Earth Sciences and Resources, China University of Geosciences, Beijing 100083, China 2. Geophysical Survey Center, China Geological Survey, Langfang 065000, China 3. Beijing Institute of Geology for Mineral Resources, Beijing 100012, China 4. Research Center of Geochemical Survey and Assessment on Land Quality, China Geological Survey, Langfang 065000, China
Mahalanobis distance is a multivariate outlier detection method. At present, there are many outlier detection methods based on Mahalanobis distance. The purpose of this paper is to compare the advantages/disadvantages of various Mahalanobis distances in identifying multivariate outliers and to select a more suitable method for identifying multivariate anomalies. The authors selected 1∶500 000 stream sediment data in the East Kunlun Mountains of Qinghai Province to compare the effects of four methods: classical Mahalanobis distance, robust Mahalanobis distance based on minimum variance determinant (FMCD), robust mahalanobis distance based on Adaptive minimum variance determinant (Adaptive), and robust mahalanobis distance based on Comedian (Comedian) in identifying Cu, Co, Cr, Ni, V, Fe; Cd, Cu, Mo, Pb, Zn, Ag and three association outliers of Au, As, Sb. The result shows that the Comedian method is the superior, while the classical method is the worst. So Comedian method is the most effective multivariate outlier detection method in this area.
耿国帅, 杨帆. 马氏距离法在东昆仑东段多元异常圈定中的对比试验[J]. 物探与化探, 2021, 45(2): 440-449.
GENG Guo-Shuai, YANG Fan. The application of Mahalanobis distance to the delineation of multivariate outliers in the East Kunlun Mountains. Geophysical and Geochemical Exploration, 2021, 45(2): 440-449.
Lindqvist L, Lundholm I, Nisca D, et al. Multivariate geochemical modeling and integration with petrophysical[J]. Geochemical Explor, 1987,29:279-294.
doi: 10.1016/0375-6742(87)90082-3
[2]
Rose A W, Hawkes H E, Webbs J S. Geochemistry in mineral exploration[M]. London: Academic Press, 1979: 657.
[3]
Stanely C R, Sinclair A J. Anomaly recognition for multi-element geochemical—A background characterization approach[J]. Geochemical Explor, 1987,29:333-353.
doi: 10.1016/0375-6742(87)90085-9
[4]
Govett G J S. Rock geochemistry in mineral exploration, Vol.3 of handbook of exploration geochemistry[M]. New York:Elsevier Science, 1983: 462.
[5]
Rocke D M, Woodruff D L. Identification of outliers in multivariate data[J]. Journal of the American Statistical Association, 1996,91:1047-1061.
doi: 10.1080/01621459.1996.10476975
[6]
Yeager M, Gregory B, Key C, et al. On using robust Mahalanobis distance estimations for feature discrimination in a damage detection scenario[J]. Structural Health Monitoring, 2019,18(1):245-253.
[7]
Xu L, Songren D, Lifang L, et al. Outlier detection based on robust mahalanobis distance and its application[J]. Open Journal of Statistics, 2019,9:15-26.
[8]
Hubert M, Rousseeuw P J, Van Aelst S. High-breakdown robust multivariate methods[J]. Statistical Science, 2008,23:92-119.
[9]
Mutalib S A, Ibrahim K. Identification of outliers: A simulation study[J]. Journal of Engineering and Applied Science, 2015,10(1):326-330.
[10]
Werner M. Idetification of multivariate outliers in large data sets[D]. Denver:University of Colorado, 2003.
[11]
Barnett V, Lewis T. Outliers in statistical data(3rd edition)[M]. New York:Wiley & Sons, 1994.
[12]
Rousseeuw P J, Zomeren B C. Unmasking multivariate outliers and leverage points[J]. Journal of the American statistical association, 1990,85(411):633-639.
doi: 10.1080/01621459.1990.10474920
Song Y H, Li Z X, Li L H, et al. Contrast between Mahalanobis distance and Eucliean distance in geochemical exploration processing[J]. Jinlin Gology, 2008,27(4):117-121.
[14]
Roberson G. Multiple outlier detection and cluster analysis of multivariate normal data[D]. Stellenbosch:University of Stellenbosch, 2003.
[15]
Pearson E, Chandra Sekar C. The efficiency of statistical tools and a criterion for the rejection of outlying observations[J]. Biometrika, 1936,28:308-320.
[16]
Chork C Y. Unmasking multivariate anomalous observations in exploration geochemical data from sheetedvein tin mineralisation near Emmaville, N.S.W[J]. Journal of Geochemical Exploration, 1990,37(2):205-223.
[17]
Hadi A S. Identifying multiple outliers in multivariate data[J]. Journal of the Royal Statistical Society, Series B, 1992,54(3):761-771.
[18]
Chork C Y, Salminen R. Interpreting exploration geochemical data from Outukumpu, Finland: A MVE-robust factor analysis[J]. Journal of Geochemical Exploration, 1993,48(1):1-20.
[19]
Hardin J, Rocke D M. The distribution of robust distances[J]. Journal of Computational and Graphical Statistics, 2005,14:910-927.
[20]
Hampel F R, Ronchetti E M, Rousseeuw P J, et al. Robust statistics:The approach based on influence functions[M]. New Jersey:John Wiley & Sons, 1986.
[21]
Hubert M, Rousseeuw P J, Van Aelst S. High-breakdown robust multivariate methods. Statistical[J]. Science, 2008,23:92-119.
doi: 10.1126/science.ns-23.576.92-d
pmid: 17736351
[22]
Rousseeuw P J, Driessen K V. Algorithm for the Minimum covariance determinant estimator[J]. Technometrics, 1999,41(3):212-223.
[23]
Majewska J. Identification of multivariate outliers-problems and challenges of visualization methods[J]. Informatykai Ekonometria, 2015,247(4):69-83.
[24]
Filzmoser P, Garrett R G, Reimann C. Multivariate outlier detection in exploration geochemistry[J]. Compurers & Geosciences, 2005,31:579-587.
[25]
Sajesh T A, Srinivasan M R. outlier detection for high dimensional data using the comedian approach[J]. Journal of Statistical Computation and Simulation, 2012,82(5):745-757.
[26]
张德全. 东昆仑地区综合找矿预测与突破[M]. 北京: 地质出版社, 2003.
[26]
Zhang D Q. Comprehensive prospecting prediction and breakthrough in Dongkunlun area of Qinghai Province[M]. Beijing: Geological Publishing House, 2003.
[27]
丁清峰. 东昆仑造山带区域成矿作用与矿产资源评价[D]. 长春:吉林大学, 2004.
[27]
Ding Q F. Metallogenesis and mineral resources assessment in East Kunlun Orogenic belt[D]. Changchun:Jilin University, 2004.
Tian L M. Geochemical data processing and targets optimization in Eastern Kunlun orogenic belt, Qinghai province[D]. Wuhan:China University of Geosciences(Wuhan), 2017.
[29]
Mahalanobis P C. On the generalized distance in statistics[J]. National Institute of Science of India, 1936,2:49-55.
Zhang W Q, Wang C F, Liu C D. A discussion on geological based on background of the East Kunlun area by geochemical exploration data[J]. Geoscience, 2002,16(3):257-262.
An G Y. Geochemical characteristics and gold metallogenic target area selection and evaluation in east Kunlun region, Qinghai Province[J]. Geophysical and Geochemical Exploration, 2013,37(2):218-223.
An G Y. Tectonic geochemistry of the central segment of the East Kunlun Mountains in Qinghai Province and its geological significance[J]. Geophysical and Geochemical Exploration, 2015,39(1):69-75.
[33]
Maronna R A, Yohai V J. Robust and efficient estimation of multivariate scatter and location[J]. Computational Statistics & Data Analysis, 2017,109:64-75.