Research Mining: R-Code Script

Thursday, 1 May 2014

Mahalanobis Distance using R code

May 01, 2014Mahalanobis distance, R code, R-Code Script 1 comment

Mahalanobis distance is one of the standardized distance measure in statistics. It is a unit less distance measure introduced by P. C. Mahalanobis in 1936. Here i have using R code and one example for multivariate data sets to find the Mahalanobis distance.

Mahalanobis Distance Formula: ${{D}^{2}}=(x-\mu {)}'\sum{^{-1}}(x-\mu )$

where,

x - Number of observations

μ - Mean

Σ - Covariance Matrix

Now we go to example program.

First Step:

Using R software and open new script.

Second step:

Import your data set (if your data format xls change to Save As csv format because csv format files are separated by comma, this only for appropriate for r data input type, (its my suggestion only otherwise use any format) )

Now import data using below code

> Input name <- read.csv(file="C:/filename.csv",head=TRUE,sep=",")
(I have save my data files in to C:/ directory so i have using above code, if you have another directory to copy the file path with filename.csv )

> Input name

Example:

Here i have using tobacco data sets for test purpose.

> tobacco <- read.csv(file="C:/tobacco.csv",head=TRUE,sep=",")
> tobacco
   BurnRate PercentSugar PercentNicotine
1       1.55        20.05            1.38
2     1.63        12.58            2.64
3       1.66        18.56            1.56
4     1.52        18.56            2.22
5       1.70        14.02            2.85
6     1.68        15.64            1.24
7       1.78        14.52            2.86
8       1.57        18.52            2.18
9       1.60        17.84            1.65
10     1.52        13.38            3.28
11     1.68        17.55            1.56
12     1.74        17.97            2.00
13     1.93        14.66            2.88
14     1.77        17.31            1.36
15     1.94        14.32            2.66
16     1.83        15.05            2.43
17     2.09        15.47            2.42
18     1.72        16.85            2.16
19     1.49        17.42            2.12
20     1.52        18.55            1.87
21     1.64        18.74            2.10
22     1.40        14.79            2.21
23     1.78        18.86            2.00
24     1.93        15.62            2.26
25     1.53        18.56            2.14
> mean<-colMeans(tobacco)
> mean
         BurnRate    PercentSugar PercentNicotine
         1.6880         16.6156          2.1612
> cm<-cov(tobacco)
> cm                            BurnRate    PercentSugar PercentNicotine
BurnRate            0.02787500   -0.1098050      0.01886083
PercentSugar    -0.10980500    4.2276840     -0.75646533
PercentNicotine 0.01886083   -0.7564653      0.27466933
> D2<-mahalanobis(tobacco,mean,cm)
> D2
[1] 3.08827463 5.35466197 1.37251420 2.61209613 2.07211223 8.90626020
[7] 1.85354309 1.96263411 1.10087851 7.04624993 1.56621848 0.78813845
[13] 3.37468305 3.77347055 2.78904427 0.99063959 5.87881205 0.08359811
[19] 1.47435780 1.45810005 1.80081271 5.88148893 2.52555955 2.13920930
[25] 2.10664213
> Now you can get the Mahalanobis distance values for further analysis that's all.

Saturday, 17 November 2012

R code for Wilcoxon rank sum test

November 17, 2012R-Code Script, Wilcoxon rank sum test No comments

Example 1 (R-Code Script)

Two samples of Young walleye were drawn from two different lakes and the fish were weighed. The data in g are:

R-Code and Results:

> X.1<-c(253,218,292,280,276,275)

> X.2<-c(216,291,256,270,277,285)

> sample<-c(rep(1,6),rep(2,6))

> w<-data.frame(c(X.1,X.2),sample)

> names(w)[1]<-'weight(g)'

> cbind(w[1:6,],w[7:12,])

weight(g) sample weight(g) sample

1 253 1 216 2

2 218 1 291 2

3 292 1 256 2

4 280 1 270 2

5 276 1 277 2

6 275 1 285 2

> idx<-sort(w[,1],index.return=TRUE)

> d<-rbind(weight=w[idx$ix,1],sample=w[idx$ix,2],

+ rank=1:12)

> dimnames(d)[[2]]<-rep('',12);d

weight 216 218 253 256 270 275 276 277 280 285 291 292

sample 2 1 1 2 2 1 1 2 1 2 2 1

rank 1 2 3 4 5 6 7 8 9 10 11 12

> rank.sum<-c(sum(d[3,d[2,]==1]),

+ sum(d[3,d[2,]==2]))

> rank.sum<-rbind(sample=c(1,2),

+ 'rank sum'=rank.sum)

> dimnames(rank.sum)[[2]]<-c('','');rank.sum

sample 1 2

rank sum 39 39

> wilcox.test(X.1,X.2)

Wilcoxon rank sum test

data: X.1 and X.2

W = 18, p-value = 1

alternative hypothesis: true location shift is not equal to 0

Research Mining

This is default featured slide 1 title

This is default featured slide 2 title

This is default featured slide 3 title

This is default featured slide 4 title

This is default featured slide 5 title

Thursday, 1 May 2014

Mahalanobis Distance using R code

Saturday, 17 November 2012

R code for Wilcoxon rank sum test

Comment

Recent

BTemplates.com

Search This Blog

Blog Archive

Labels

Translate

Report Abuse

About Me

Featured post

Mahalanobis Distance using R code

Weekly

Labels

Blog Archive

Labels

Blogroll

About