The Research Mining Technology

Showing posts with label R-Code Script. Show all posts
Showing posts with label R-Code Script. Show all posts

Thursday, 1 May 2014

Mahalanobis Distance using R code

Mahalanobis distance is one of the standardized distance measure in statistics. It is a unit less distance measure introduced by P. C. Mahalanobis in 1936. Here i have using R code and one example for multivariate data sets to find the Mahalanobis distance.
Mahalanobis Distance Formula: ${{D}^{2}}=(x-\mu {)}'\sum{^{-1}}(x-\mu )$
where,
x - Number of observations        
μ - Mean
Σ - Covariance Matrix
Now we go to example program.
First Step:
Using R software and open new script.
Second step:
Import your data set (if your data format xls change to Save As  csv format because csv format files are separated by comma, this only for appropriate for r data input type, (its my suggestion only otherwise use any format) )
Now import data using below code
> Input name <- read.csv(file="C:/filename.csv",head=TRUE,sep=",")
(I have save my data files in to C:/ directory so i have using above code, if you have another directory to copy the file path with filename.csv )
> Input name

Example: 
Here i have using tobacco data sets for test purpose.

> tobacco <- read.csv(file="C:/tobacco.csv",head=TRUE,sep=",")
> tobacco
   BurnRate PercentSugar PercentNicotine
1       1.55        20.05            1.38
2       1.63        12.58            2.64
3       1.66        18.56            1.56
4       1.52        18.56            2.22
5       1.70        14.02            2.85
6       1.68        15.64            1.24
7       1.78        14.52            2.86
8       1.57        18.52            2.18
9       1.60        17.84            1.65
10     1.52        13.38            3.28
11     1.68        17.55            1.56
12     1.74        17.97            2.00
13     1.93        14.66            2.88
14     1.77        17.31            1.36
15     1.94        14.32            2.66
16     1.83        15.05            2.43
17     2.09        15.47            2.42
18     1.72        16.85            2.16
19     1.49        17.42            2.12
20     1.52        18.55            1.87
21     1.64        18.74            2.10
22     1.40        14.79            2.21
23     1.78        18.86            2.00
24     1.93        15.62            2.26
25     1.53        18.56            2.14
> mean<-colMeans(tobacco)
> mean
      
   BurnRate    PercentSugar PercentNicotine
         1.6880         16.6156          2.1612
> cm<-cov(tobacco)
> cm
                            BurnRate    PercentSugar  PercentNicotine
BurnRate            0.02787500   -0.1098050      0.01886083
PercentSugar    -0.10980500    4.2276840     -0.75646533
PercentNicotine  0.01886083   -0.7564653      0.27466933
> D2<-mahalanobis(tobacco,mean,cm)
> D2
 
[1] 3.08827463 5.35466197 1.37251420 2.61209613 2.07211223 8.90626020
 [7] 1.85354309 1.96263411 1.10087851 7.04624993 1.56621848 0.78813845
[13] 3.37468305 3.77347055 2.78904427 0.99063959 5.87881205 0.08359811
[19] 1.47435780 1.45810005 1.80081271 5.88148893 2.52555955 2.13920930
[25] 2.10664213
>
 Now you can get the Mahalanobis distance values for further analysis that's all. 

Share:

Saturday, 17 November 2012

R code for Wilcoxon rank sum test



Example 1 (R-Code Script)
     Two samples of Young walleye were drawn from two different lakes and the fish were weighed. The data in g are:
R-Code and Results:
> X.1<-c(253,218,292,280,276,275)
> X.2<-c(216,291,256,270,277,285)
> sample<-c(rep(1,6),rep(2,6))
> w<-data.frame(c(X.1,X.2),sample)
> names(w)[1]<-'weight(g)'
> cbind(w[1:6,],w[7:12,])
  weight(g) sample weight(g) sample
1       253      1       216      2
2       218      1       291      2
3       292      1       256      2
4       280      1       270      2
5       276      1       277      2
6       275      1       285      2
> idx<-sort(w[,1],index.return=TRUE)
> d<-rbind(weight=w[idx$ix,1],sample=w[idx$ix,2],
+ rank=1:12)
> dimnames(d)[[2]]<-rep('',12);d
                                                      
weight 216 218 253 256 270 275 276 277 280 285 291 292
sample   2   1   1   2   2   1   1   2   1   2   2   1
rank     1   2   3   4   5   6   7   8   9  10  11  12
> rank.sum<-c(sum(d[3,d[2,]==1]),
+ sum(d[3,d[2,]==2]))
> rank.sum<-rbind(sample=c(1,2),
+ 'rank sum'=rank.sum)
> dimnames(rank.sum)[[2]]<-c('','');rank.sum
             
sample    1  2
rank sum 39 39
> wilcox.test(X.1,X.2)

        Wilcoxon rank sum test

data:  X.1 and X.2
W = 18, p-value = 1
alternative hypothesis: true location shift is not equal to 0
>
Share:

Comment

BTemplates.com

Search This Blog

Powered by Blogger.

Translate

About Me

My photo
Tirunelveli, Tamil Nadu, India

Featured post

Mahalanobis Distance using R code

Mahalanobis distance is one of the standardized distance measure in statistics. It is a unit less distance measure introduced by P. C. Mah...

Weekly