xml - R - Iterating xpathApply issue -
this first post on se use often!
i have been having hard time trying extract information pmc xml files using r.
i have been trying replicate this code find rather iterating through article set , extracting items once per article, multiple_papers function returns whole list each article (in example there 6 articles , list printed 36 times).
library(xml) searchresults <- xmlroot(xmltreeparse('pmc_se_examp.xml', useinternal=true)) parse_title <- function(paper){ print (xpathapply(paper, '//title-group/article-title', xmlvalue)) } parse_multiple_papers <-function(papers){ thispaper <- xpathapply(papers, "//pmc-articleset/*", parse_title) } x <- parse_multiple_papers(searchresults)
i cannot life of me figure out why happening , wondered if shed light on me? in advance!
after spending few hours retracing steps , consulting original code, have managed behaviour looking for. aim extract author, title, pmid , abstract each paper within set. following code that. welcome feedback.
searchresults<- xmlroot(xmlparse('pmc_se_examp.xml')) #load file , set root parse_author <- function(author){ #lists separate names. not sure how deal multiple names yet used first fn <- xmlvalue(author[["given-names"]]) ln <- xmlvalue(author[["surname"]]) if (is.null(list(forname=fn, lastname=ln))){ list(forname=na, lastname=na)} else{list(forname=fn, lastname=ln)} } parse_paper <- function(paper){ author_info <- xpathapply(paper, ".//contrib-group/contrib/name", parse_author) title_text <- unlist(xpathapply(paper, ".//title-group/article-title", xmlvalue)) if(is.null(title_text)){title_text=na} #for incomplete entries abstract_text <- unlist(xpathapply(paper, ".//abstract", xmlvalue)) if(is.null(abstract_text)){abstract_text=na} pmid <-xpathsapply(paper, ".//article-meta/article-id[@pub-id-type = 'pmid']", xmlvalue) if(is.null(pmid)){pmid=na} #in original, bind.data.frame used - doesn't work here, create data frame first , rbind dat <- data.frame(pid = pmid,aut=author_info[1],tit=title_text,ab=abstract_text) res <- rbind(dat) res } parse_multiple_papers <- function(papers){ res <- xpathapply(papers, "/pmc-articleset/*", parse_paper) do.call(rbind.data.frame, res) } z <- parse_multiple_papers(searchresults) write.csv(z,"datafile.csv")
Comments
Post a Comment