## Question:

What's the difference between the way R and SAS merge?

SAS's Merge command returns 205546 rows, R's returns 207208 rows.

Here is an example.

I'm working with the IBGE file available at:

ftp://ftp.ibge.gov.br/PNS/2013/microdados/pns_2013_microdados.zip

The DOMPNS2013.txt and PESPNS2013.txt databases will be used

SAS:

1) Assignment of variables: execute the files "input DOMPNS2013" and "input PESPNS2013"

2) Selection of an interest value and Merge:

```
data dompns2013v3;
set dompns2013;
if V0015 = 1;
run;
/*NOTE: There were 81187 observations read from the data set WORK.DOMPNS2013.
NOTE: The data set WORK.DOMPNS2013V2 has 64348 observations and 20 variables.*/
data arq.dompes2013v3;
merge dompns2013v3 pespns2013;
by v0001 v0024 upa_pns v0006;
run;
/*NOTE: There were 64348 observations read from the data set WORK.DOMPNS2013V2.
NOTE: There were 205546 observations read from the data set WORK.PESPNS2013.
NOTE: The data set ARQ.DOMPES2013V2 has 205546 observations and 388 variables.
NOTE: DATA statement used (Total process time):*/
```

#

A: 1) assignment of variables:

```
d2013 = read.fwf(file='DOMPNS2013.txt',widths=c(2,8,7,4,2,6,1,1))
names(d2013) = c("v0001","v0024","upa_pns","v0006","v0015","skip1","v0026","v0031")
d2013 = subset(d2013,select=c("v0001","v0024","upa_pns","v0006","v0015","v0026","v0031"))
p2013 = read.fwf(file='PESPNS2013.txt',widths=c(2,8,7,4,1,2,2,2,1,8,3))
names(p2013)=c("v0001","v0024","upa_pns","v0006","v0025","skip1","c00301","c004","c006","skip2","c008")
p2013=subset(p2013,select=c("v0001","v0024","upa_pns","v0006","v0025","c00301","c004","c006","c008"))
```

2) Selection of an interest value and Merge:

```
dim(d2013)
[1] 81187 7
d2013 = subset(d2013, d2013$v0015 == 1)
dim(d2013)
[1] 64348 7
dim(p2013)
[1] 205546 9
dpmerge = merge( p2013,d2013,by=c("v0001","v0024","upa_pns","v0006"))
dim(dpmerge)
[1] 207208 12
```

## Answer:

SAS is removing duplicate records from DOMPNS before merging.

If you do `d2013 <- unique(d2013)`

before merging into R, the number of observations will be the same.