r - "Error: Aesthetics must be either length 1 or the same as the data (1148) ...." Why? -
wickham (2009: 164-6) gives example of plotting multiple time series simultaneously. following code replicates example:
range01 <- function(x) { rng <- range(x, na.rm = true) (x - rng[1])/diff(rng) } emp <- subset(economics_long, variable %in% c("uempmed", "unemploy")) emp2 <- ddply(emp, .(variable), transform, value = range01(value)) qplot(date, value, data = emp2, geom = "line", color = variable, linetype = variable) # produces plot looks 1 on p. 166 of ggplot2 book.
here range01 used recode variables values within [0,1] series different orders of magnitude can plotted on identical scales. wickham's original starts employment
data provided ggplot2 , melts long form, here i've taken shortcut of starting employment_long
version.
but wickham (p. 27) points out tapping "full power" of ggplot2 requires manual construction of plots layers, using ggplot() function. here example again using ggplot() instead of qplot():
# same thing using ggplot commands ggplot(data = emp2, aes(x = date)) + geom_line(aes(y = value, group = variable, color = variable, linetype = variable)) # same results
both examples take advantage of ggplot2's default settings. suppose want greater control on aesthetics. perhaps variables lend particular color schemes (e.g., green might used environmentally friendly variables , black, detrimental ones); or perhaps in long monograph many plots want ensure consistency. furthermore, if plots used both in presentations , printed black-and-white text, may want associate specific line types particular series; case if concerned viewers color blindness. finally, variable names poor descriptors of variables are, want associate variable labels individual time series.
so define following economics dataset:
# try control bit more economics_colors = c("pce" = "red", "pop" = "orange", "psavert" = "yellow", "uempmed" = "green", "unemploy" = "blue") economics_linetypes = c("pce" = "solid", "pop" = "dashed", "psavert" = "dotted", "uempmed" = "dotdash", "unemploy" = "longdash") economics_labels = c( "pce" = "personal consumption expenditures", "pop" = "total population", "psavert" = "personal savings rate", "uempmed" = "median duration of unemployment", "unemploy" = "number of unemployed" )
now construct plot adding separate layers (wickham 2009: 164-5) each variable:
# first line-by-line employment.plot <- ggplot(emp2) + aes(x = date) + scale_linetype_manual(values = economics_linetypes, labels = economics_labels) employment.plot <- employment.plot + geom_line(data = subset(emp2, variable == "uempmed"), aes(y = value, linetype = "uempmed"), color = economics_colors["uempmed"]) employment.plot <- employment.plot + geom_line(data = subset(emp2, variable == "unemploy"), aes(x = date, y = value, linetype = "unemploy"), color = economics_colors["unemploy"]) employment.plot # except specific line colors, produces same plot before.
notice 2 things here. first, line types mapped colors set (see wickham 2009: 47-49). produces desired result of single legend distinct color-linetype combinations each series.
second, though data organized in "long" format, used subset select out individual series. not best solution. wickham (164-5) says:
... better alternative melt data long format , visualize that. in molten data time series have value stored in value variable , can distinguish between them variable variable.
so let's try approach:
# try automatic way employment.plot <- ggplot(data = emp2, aes(x = date)) + scale_linetype_manual(values = economics_linetypes, labels = economics_labels) employment.plot <- employment.plot + geom_line(aes(y = value, group = variable, linetype = economics_linetypes), color = economics_colors) employment.plot # throws "error: aesthetics must either length 1 or same data (1148) ..."
as comment indicates, code throws error regarding aesthetics. why?
also, there way accomplish multiple goals of using melted data single variable variable triggering separate lines, controlling colors , line types associated each series, , using code standardize such conventions across multiple plots?
references
wickham, hadley. 2009. ggplot2: elegant graphics data analysis. springer.
the aesthetics should mapped dimension of dataset.
what saying last command is: "for each 'data point' (or group in case) assign linetype equal economics_linetypes
."
but there not (yet) information on how map each record (group) value in economics_linetypes
. rightly return error.
what should map linetype
dimension controls it. is: "for each value in dimension, use different value of linetype
" i.e.:
geom_line(aes(y = value, group = variable, linetype = variable)
once have defined can map value of variable specific linetype
definition of scale:
scale_linetype_manual(values = economics_linetypes, labels = economics_labels)
all of appplies color of course, @ end have:
employment.plot <- ggplot(data = emp2, aes(x = date)) + geom_line(aes(y = value, group = variable, linetype = variable, color = variable)) + scale_linetype_manual(values = economics_linetypes, labels = economics_labels) + scale_color_manual(values = economics_colors, labels = economics_labels)
hope clear enough.
Comments
Post a Comment