The Research Mining Technology

Showing posts with label R code. Show all posts
Showing posts with label R code. Show all posts

Thursday, 1 May 2014

Mahalanobis Distance using R code

Mahalanobis distance is one of the standardized distance measure in statistics. It is a unit less distance measure introduced by P. C. Mahalanobis in 1936. Here i have using R code and one example for multivariate data sets to find the Mahalanobis distance.
Mahalanobis Distance Formula: ${{D}^{2}}=(x-\mu {)}'\sum{^{-1}}(x-\mu )$
where,
x - Number of observations        
μ - Mean
Σ - Covariance Matrix
Now we go to example program.
First Step:
Using R software and open new script.
Second step:
Import your data set (if your data format xls change to Save As  csv format because csv format files are separated by comma, this only for appropriate for r data input type, (its my suggestion only otherwise use any format) )
Now import data using below code
> Input name <- read.csv(file="C:/filename.csv",head=TRUE,sep=",")
(I have save my data files in to C:/ directory so i have using above code, if you have another directory to copy the file path with filename.csv )
> Input name

Example: 
Here i have using tobacco data sets for test purpose.

> tobacco <- read.csv(file="C:/tobacco.csv",head=TRUE,sep=",")
> tobacco
   BurnRate PercentSugar PercentNicotine
1       1.55        20.05            1.38
2       1.63        12.58            2.64
3       1.66        18.56            1.56
4       1.52        18.56            2.22
5       1.70        14.02            2.85
6       1.68        15.64            1.24
7       1.78        14.52            2.86
8       1.57        18.52            2.18
9       1.60        17.84            1.65
10     1.52        13.38            3.28
11     1.68        17.55            1.56
12     1.74        17.97            2.00
13     1.93        14.66            2.88
14     1.77        17.31            1.36
15     1.94        14.32            2.66
16     1.83        15.05            2.43
17     2.09        15.47            2.42
18     1.72        16.85            2.16
19     1.49        17.42            2.12
20     1.52        18.55            1.87
21     1.64        18.74            2.10
22     1.40        14.79            2.21
23     1.78        18.86            2.00
24     1.93        15.62            2.26
25     1.53        18.56            2.14
> mean<-colMeans(tobacco)
> mean
      
   BurnRate    PercentSugar PercentNicotine
         1.6880         16.6156          2.1612
> cm<-cov(tobacco)
> cm
                            BurnRate    PercentSugar  PercentNicotine
BurnRate            0.02787500   -0.1098050      0.01886083
PercentSugar    -0.10980500    4.2276840     -0.75646533
PercentNicotine  0.01886083   -0.7564653      0.27466933
> D2<-mahalanobis(tobacco,mean,cm)
> D2
 
[1] 3.08827463 5.35466197 1.37251420 2.61209613 2.07211223 8.90626020
 [7] 1.85354309 1.96263411 1.10087851 7.04624993 1.56621848 0.78813845
[13] 3.37468305 3.77347055 2.78904427 0.99063959 5.87881205 0.08359811
[19] 1.47435780 1.45810005 1.80081271 5.88148893 2.52555955 2.13920930
[25] 2.10664213
>
 Now you can get the Mahalanobis distance values for further analysis that's all. 

Share:

Comment

BTemplates.com

Search This Blog

Powered by Blogger.

Translate

About Me

My photo
Tirunelveli, Tamil Nadu, India

Featured post

Mahalanobis Distance using R code

Mahalanobis distance is one of the standardized distance measure in statistics. It is a unit less distance measure introduced by P. C. Mah...

Weekly