Question:
In my case I have two data.frames:
> head(Trecho)
Xt Yt Zt
1 -75.56468 1.642710 0
2 -74.56469 1.639634 0
3 -73.56469 1.636557 0
4 -72.56470 1.633480 0
5 -71.56470 1.630403 0
6 -70.56471 1.627326 0
> head(TrechoSim)
Xs Ys Zs
1 -71.7856 -0.509196 0
2 -71.7856 -0.509196 0
3 -71.7856 -0.509196 0
4 -71.7856 -0.509196 0
5 -71.7856 -0.509196 0
6 -71.7856 -0.509196 0
The Trecho
data frame has approximately 5000 lines and the TrechoSim
has 20000 lines. Similar to Excel PROCV
, I need to fetch the closest value where Xt = Xs (in excel I use TRUE, and it returns the first value closest to Xt). There is no tolerance for this closeness. I need all the values of the Trecho
data frame with its respective value closest to the TrechoSim
. I tried difference_inner_join
but it returns NA
values in some rows.
Grateful,
Answer:
I don't have the original datasets or Excel installed to test the PROCV
function, but I believe the code below solved the problem.
The function procura
to calculate the difference, in absolute value, between a number and a vector and finds which position of the vector is closest to this number.
The code is not optimized, but I imagine it should run reasonably fast on today's computers. I tested the same code by increasing the sample sizes of the simulated data to 5000 and 20000 and my code took less than 2 seconds to do all the comparisons.
Trecho <- data.frame(Xt=rnorm(5), Yt=rnorm(5), Zt=0)
TrechoSim <- data.frame(Xs=rnorm(20), Ys=rnorm(20), Zt=0)
procura <- function(x, y){
return(which.min(abs(x-y)))
}
index <- 0
for (j in 1:length(Trecho$Xt)){
index[j] <- procura(Trecho$Xt[j], TrechoSim$Xs)
}
Trecho
TrechoSim[index, ]