The Research Mining Technology

Thursday 1 May 2014

Mahalanobis Distance using R code

Mahalanobis distance is one of the standardized distance measure in statistics. It is a unit less distance measure introduced by P. C. Mahalanobis in 1936. Here i have using R code and one example for multivariate data sets to find the Mahalanobis distance.
Mahalanobis Distance Formula: ${{D}^{2}}=(x-\mu {)}'\sum{^{-1}}(x-\mu )$
where,
x - Number of observations        
μ - Mean
Σ - Covariance Matrix
Now we go to example program.
First Step:
Using R software and open new script.
Second step:
Import your data set (if your data format xls change to Save As  csv format because csv format files are separated by comma, this only for appropriate for r data input type, (its my suggestion only otherwise use any format) )
Now import data using below code
> Input name <- read.csv(file="C:/filename.csv",head=TRUE,sep=",")
(I have save my data files in to C:/ directory so i have using above code, if you have another directory to copy the file path with filename.csv )
> Input name

Example: 
Here i have using tobacco data sets for test purpose.

> tobacco <- read.csv(file="C:/tobacco.csv",head=TRUE,sep=",")
> tobacco
   BurnRate PercentSugar PercentNicotine
1       1.55        20.05            1.38
2       1.63        12.58            2.64
3       1.66        18.56            1.56
4       1.52        18.56            2.22
5       1.70        14.02            2.85
6       1.68        15.64            1.24
7       1.78        14.52            2.86
8       1.57        18.52            2.18
9       1.60        17.84            1.65
10     1.52        13.38            3.28
11     1.68        17.55            1.56
12     1.74        17.97            2.00
13     1.93        14.66            2.88
14     1.77        17.31            1.36
15     1.94        14.32            2.66
16     1.83        15.05            2.43
17     2.09        15.47            2.42
18     1.72        16.85            2.16
19     1.49        17.42            2.12
20     1.52        18.55            1.87
21     1.64        18.74            2.10
22     1.40        14.79            2.21
23     1.78        18.86            2.00
24     1.93        15.62            2.26
25     1.53        18.56            2.14
> mean<-colMeans(tobacco)
> mean
      
   BurnRate    PercentSugar PercentNicotine
         1.6880         16.6156          2.1612
> cm<-cov(tobacco)
> cm
                            BurnRate    PercentSugar  PercentNicotine
BurnRate            0.02787500   -0.1098050      0.01886083
PercentSugar    -0.10980500    4.2276840     -0.75646533
PercentNicotine  0.01886083   -0.7564653      0.27466933
> D2<-mahalanobis(tobacco,mean,cm)
> D2
 
[1] 3.08827463 5.35466197 1.37251420 2.61209613 2.07211223 8.90626020
 [7] 1.85354309 1.96263411 1.10087851 7.04624993 1.56621848 0.78813845
[13] 3.37468305 3.77347055 2.78904427 0.99063959 5.87881205 0.08359811
[19] 1.47435780 1.45810005 1.80081271 5.88148893 2.52555955 2.13920930
[25] 2.10664213
>
 Now you can get the Mahalanobis distance values for further analysis that's all. 

Share:

1 comment:

  1. Thank you so much for stepwise clarification. It is very useful.

    ReplyDelete

Comment

BTemplates.com

Search This Blog

Powered by Blogger.

Translate

About Me

My photo
Tirunelveli, Tamil Nadu, India

Featured post

Mahalanobis Distance using R code

Mahalanobis distance is one of the standardized distance measure in statistics. It is a unit less distance measure introduced by P. C. Mah...

Weekly