r - How to effectively flatten nested lists and dataframes into a single dataframe? -


i have data formatted in way that's difficult use, i'm trying flatten out. minimum reproducible example here.

> str(sampledata) list of 4  $ events       :'data.frame':  2 obs. of  3 variables:   ..$ cateringoptions:list of 2   .. ..$ :'data.frame': 1 obs. of  3 variables:   .. .. ..$ agreed : logi true   .. .. ..$ tnc    :'data.frame': 1 obs. of  5 variables:   .. .. .. ..$ identity      : chr "spicyowing"   .. .. .. ..$ schema        : logi na   .. .. .. ..$ elementid     : chr "105031"   .. .. .. ..$ elementtype   : logi na   .. .. .. ..$ elementversion: logi na   .. .. ..$ address: chr "new york"   .. ..$ :'data.frame': 1 obs. of  3 variables:   .. .. ..$ agreed : logi true   .. .. ..$ tnc    :'data.frame': 1 obs. of  5 variables:   .. .. .. ..$ identity      : chr "baconeggs"   .. .. .. ..$ schema        : logi na   .. .. .. ..$ elementid     : chr "105032"   .. .. .. ..$ elementtype   : logi na   .. .. .. ..$ elementversion: logi na   .. .. ..$ address: chr "seattle"   ..$ action         : num [1:2] 1 1   ..$ volume         : num [1:2] 1000 2000  $ host         :list of 5   ..$ identity      : chr "john"   ..$ schema        : logi na   ..$ elementid     : chr "101505"   ..$ elementtype   : logi na   ..$ elementversion: logi na  $ sender       :list of 5   ..$ identity      : chr "jane"   ..$ schema        : logi na   ..$ elementid     : chr "101005"   ..$ elementtype   : logi na   ..$ elementversion: logi na  $ completeddate: chr "/date(1490112000000)/" 

expected

> expectedoutcome   events.cateringoptions.agreed events.cateringoptions.tnc.identity events.cateringoptions.tnc.schema events.cateringoptions.tnc.elementid 1                            na                          spicyowing                                true                               105031 2                            na                           baconeggs                                true                               105032   events.cateringoptions.tnc.elementtype events.cateringoptions.tnc.elementversion events.cateringoptions.address events.action events.volume host.identity 1                                     na                                        na                       new york             1          1000          john 2                                     na                                        na                        seattle             1          2000          john   host.schema host.elementid host.elementtype host.elementversion sender.identity sender.schema sender.elementid sender.elementtype sender.elementversion 1          na         101505               na                  na            jane            na           101005                 na                    na 2          na         101505               na                  na            jane            na           101005                 na                    na           completeddate 1 /date(1490112000000)/ 2 /date(1490112000000)/ 

the check function

check<-function(li){   aredf<-sapply(1:length(li), function(i) class(li[[i]]) == "data.frame")   arelist<-sapply(1:length(li), function(i) class(li[[i]]) == "list")   tmp1 <- null   tmp2 <- null   if(any(aredf)){     for(j in which(aredf)){       columns <- jsonlite::flatten(li[[j]])       li[[j]]  <- check(columns)     }     tmp1<-plyr::rbind.fill(li[aredf])     #return(tmp1)   }   if(any(arelist)){     for(j in which(arelist)){       li[[j]]<-check(li[[j]])     }     tmp2<-do.call(cbind,li)     #return(tmp2)   }   if(!is.null(tmp1) & !is.null(tmp2)){     return (cbind(tmp1,tmp2))   }   else if(!is.null(tmp1)){     return (tmp1)   }   else if(!is.null(tmp2)){     return (tmp2)   }   return(li) } 

results

> str(check(sampledata)) 'data.frame': 2 obs. of  29 variables:  $ cateringoptions.agreed                   : logi  true true  $ cateringoptions.address                  : chr  "new york" "seattle"  $ cateringoptions.tnc.identity             : chr  "spicyowing" "baconeggs"  $ cateringoptions.tnc.schema               : logi  na na  $ cateringoptions.tnc.elementid            : chr  "105031" "105032"  $ cateringoptions.tnc.elementtype          : logi  na na  $ cateringoptions.tnc.elementversion       : logi  na na  $ action                                   : num  1 1  $ volume                                   : num  1000 2000  $ events.cateringoptions.agreed            : logi  true true  $ events.cateringoptions.address           : chr  "new york" "seattle"  $ events.cateringoptions.tnc.identity      : chr  "spicyowing" "baconeggs"  $ events.cateringoptions.tnc.schema        : logi  na na  $ events.cateringoptions.tnc.elementid     : chr  "105031" "105032"  $ events.cateringoptions.tnc.elementtype   : logi  na na  $ events.cateringoptions.tnc.elementversion: logi  na na  $ events.action                            : num  1 1  $ events.volume                            : num  1000 2000  $ host.identity                            : factor w/ 1 level "john": 1 1  $ host.schema                              : logi  na na  $ host.elementid                           : factor w/ 1 level "101505": 1 1  $ host.elementtype                         : logi  na na  $ host.elementversion                      : logi  na na  $ sender.identity                          : factor w/ 1 level "jane": 1 1  $ sender.schema                            : logi  na na  $ sender.elementid                         : factor w/ 1 level "101005": 1 1  $ sender.elementtype                       : logi  na na  $ sender.elementversion                    : logi  na na  $ completeddate                            : factor w/ 1 level "/date(1490112000000)/": 1 1 

i have it, nested dataframe being duped. also, code takes long. have idea how can go flattening this?

edit:

i added solution in end in gist

here take @ it, purrr.
idea similar yours, different syntax: flatten() nested dataframes, rbind() them.
if understand code properly, mine different @ end, since i'll try more "jsonlite::flatten-friendly" structure apply once more end result:

library(jsonlite) library(purrr) res <-    sampledata %>%    modify_if(     is.list,      .f = ~ modify_if(       .x,       .p = function(x) all(sapply(x, is.data.frame)),        .f = ~ do.call("rbind", lapply(.x, jsonlite::flatten))     )   ) %>%    as.data.frame() %>%    jsonlite::flatten() str(res)  # 'data.frame': 2 obs. of  20 variables: #  $ events.action                            : num  1 1 #  $ events.volume                            : num  1000 2000 #  $ host.identity                            : chr  "john" "john" #  $ host.schema                              : logi  na na #  $ host.elementid                           : chr  "101505" "101505" #  $ host.elementtype                         : logi  na na #  $ host.elementversion                      : logi  na na #  $ sender.identity                          : chr  "jane" "jane" #  $ sender.schema                            : logi  na na #  $ sender.elementid                         : chr  "101005" "101005" #  $ sender.elementtype                       : logi  na na #  $ sender.elementversion                    : logi  na na #  $ completeddate                            : chr  "/date(1490112000000)/" "/date(1490112000000)/" #  $ events.cateringoptions.agreed            : logi  true true #  $ events.cateringoptions.address           : chr  "new york" "seattle" #  $ events.cateringoptions.tnc.identity      : chr  "spicyowing" "baconeggs" #  $ events.cateringoptions.tnc.schema        : logi  na na #  $ events.cateringoptions.tnc.elementid     : chr  "105031" "105032" #  $ events.cateringoptions.tnc.elementtype   : logi  na na #  $ events.cateringoptions.tnc.elementversion: logi  na na 

i've got 1 mismatch expectedoutcome if may, might on side:

all.equal(expectedoutcome[sort(names(expectedoutcome))], res[sort(names(res))]) # [1] "component “events.cateringoptions.agreed”: 'is.na' value mismatch: 0 in current 2 in target" 

Comments

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

c# - Selenium Authentication Popup preventing driver close or quit -

tensorflow when input_data MNIST_data , zlib.error: Error -3 while decompressing: invalid block type -