xml - R - Iterating xpathApply issue -


this first post on se use often!

i have been having hard time trying extract information pmc xml files using r.

i have been trying replicate this code find rather iterating through article set , extracting items once per article, multiple_papers function returns whole list each article (in example there 6 articles , list printed 36 times).

sample xml file

library(xml)  searchresults <- xmlroot(xmltreeparse('pmc_se_examp.xml', useinternal=true))  parse_title <- function(paper){     print (xpathapply(paper, '//title-group/article-title', xmlvalue)) }  parse_multiple_papers <-function(papers){     thispaper <- xpathapply(papers, "//pmc-articleset/*", parse_title) }   x <- parse_multiple_papers(searchresults) 

i cannot life of me figure out why happening , wondered if shed light on me? in advance!

after spending few hours retracing steps , consulting original code, have managed behaviour looking for. aim extract author, title, pmid , abstract each paper within set. following code that. welcome feedback.

searchresults<- xmlroot(xmlparse('pmc_se_examp.xml')) #load file , set root  parse_author <- function(author){ #lists separate names. not sure how deal multiple names yet used first    fn  <- xmlvalue(author[["given-names"]])   ln  <- xmlvalue(author[["surname"]])   if (is.null(list(forname=fn, lastname=ln))){     list(forname=na, lastname=na)}    else{list(forname=fn, lastname=ln)} }   parse_paper <- function(paper){   author_info <- xpathapply(paper, ".//contrib-group/contrib/name", parse_author)   title_text <- unlist(xpathapply(paper, ".//title-group/article-title", xmlvalue))   if(is.null(title_text)){title_text=na} #for incomplete entries   abstract_text <- unlist(xpathapply(paper, ".//abstract", xmlvalue))   if(is.null(abstract_text)){abstract_text=na}   pmid <-xpathsapply(paper, ".//article-meta/article-id[@pub-id-type = 'pmid']", xmlvalue)   if(is.null(pmid)){pmid=na} #in original, bind.data.frame used - doesn't work here, create data frame first , rbind   dat <- data.frame(pid = pmid,aut=author_info[1],tit=title_text,ab=abstract_text)   res <- rbind(dat)   res }  parse_multiple_papers <- function(papers){  res <- xpathapply(papers, "/pmc-articleset/*", parse_paper)  do.call(rbind.data.frame, res) }  z <- parse_multiple_papers(searchresults) write.csv(z,"datafile.csv") 

Comments

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

c# - Selenium Authentication Popup preventing driver close or quit -

tensorflow when input_data MNIST_data , zlib.error: Error -3 while decompressing: invalid block type -