Creating a dataframe based on two other dataframes using dplyr in R

Question:

These are my dataframes:

df<-    as.data.frame(matrix(rexp(200),, 25))
colnames(df)<-c("A","B","C","D","E","F","G","H","I","J",
        "K","L","M","N","O","P","Q","R","S","T",
        "U","V","X","Z","W")


df.new<-as.data.frame(matrix(rexp(200),,20))
colnames(df.new)<-c("A.B","A.B.new",
            "A.C","A.C.new",
            "A.F","A.F.new",
                "B.C","B.C.new",
                    "C.D","C.D.new",
                    "F.G","F.G.new",
                "H.I","H.I.new",
                    "H.K","H.K.new",
                    "L.M","L.M.new",
            "N.Q","N.Q.new")
df.new<-df.new[-c(9:10),]


finaldf<-cbind(df,df.new)
colnames(finaldf)<-c("A","B","C","D","E","F","G","H","I","J",
        "K","L","M","N","O","P","Q","R","S","T",
        "U","V","X","Z","W","A.B","A.B.new",
            "A.C","A.C.new",
            "A.F","A.F.new",
                "B.C","B.C.new",
                    "C.D","C.D.new",
                    "F.G","F.G.new",
                "H.I","H.I.new",
                    "H.K","H.K.new",
                    "L.M","L.M.new",
            "N.Q","N.Q.new")

I'd like to make the finaldf finaldf sort the columns like this:

colnames(finaldf)<-c("A","A.B","A.B.new",
            "A","A.C","A.C.new",
            "A","A.F","A.F.new",
            "B","B.C","B.C.new",
            "C","C.D","C.D.new",
            "F","F.G","F.G.new",
            "H","H.I","H.I.new",
            "H","H.K","H.K.new",
            "L","L.M","L.M.new",
            "N","N.Q","N.Q.new")

As my original dataframe is much bigger, I need a more robust code, which is beyond my capacity considering that I'm new to R.

Note that the idea is to simply take columns from the df dataframe and throw them into the df.new dataframe. But these columns must fit in the order established by the df.new dataframe.

And I would like to do this using the dplyr package. It's possible?

Edited:

Well, my original code comes with the names of the stocks on the Bovespa:

nms.new<-c("ABEV3.BBAS3", "ABEV3.BBAS3.new", "ABEV3.BRAP4", "ABEV3.BRAP4.new", 
"ABEV3.BRKM5", "ABEV3.BRKM5.new", "ABEV3.CSAN3", "ABEV3.CSAN3.new", 
"ABEV3.CSNA3", "ABEV3.CSNA3.new", "ABEV3.CYRE3", "ABEV3.CYRE3.new", 
"ABEV3.DTEX3", "ABEV3.DTEX3.new", "ABEV3.ELPL4", "ABEV3.ELPL4.new", 
"ABEV3.EVEN3", "ABEV3.EVEN3.new", "ABEV3.FIBR3", "ABEV3.FIBR3.new", 
"ABEV3.GGBR4", "ABEV3.GGBR4.new", "ABEV3.GOAU4", "ABEV3.GOAU4.new", 
"ABEV3.HYPE3", "ABEV3.HYPE3.new", "ABEV3.JBSS3", "ABEV3.JBSS3.new","SANB11.BBAS3","SANB11.BBAS3.new")

I would need to separate what comes before and after the period and not 1 in 1. Sometimes it comes 5 letters before the period, sometimes it comes 6 letters.

Answer:

Here goes.
The trick is to group by the first few characters of names(finaldf) and then process the list of groups.

First we choose the names that interest us.

nms <- names(finaldf)
nms <- nms[sapply(nms, nchar) > 1]

Now they are grouped by the first 3 characters.

sp <- split(nms, substr(nms, 1, 3))
nms <- lapply(sp, function(s) c(substr(s[1], 1, 1), s))
nms <- unlist(nms)
result <- finaldf[nms]

The result result has the dimensions of the question.

dim(result)
#[1]  8 30

But beware, as there cannot be repeated names, there will be A , A.1 and A.2 . And the same thing for any others where there are repetitions, for example HI and HK .

names(result)
# [1] "A"       "A.B"     "A.B.new" "A.1"     "A.C"     "A.C.new"
# [7] "A.2"     "A.F"     "A.F.new" "B"       "B.C"     "B.C.new"
#[13] "C"       "C.D"     "C.D.new" "F"       "F.G"     "F.G.new"
#[19] "H"       "H.I"     "H.I.new" "H.1"     "H.K"     "H.K.new"
#[25] "L"       "L.M"     "L.M.new" "N"       "N.Q"     "N.Q.new"
Scroll to Top