Grouping data in R based on specific column values -
i have set of data in csv file need group based on transitions of 1 column. i'm new r , i'm having trouble finding right way accomplish this.
simplified version of data:
time phase pressure speed 1 0 0.015 0 2 25 0.015 0 3 25 0.234 0 4 25 0.111 0 5 0 0.567 0 6 0 0.876 0 7 75 0.234 0 8 75 0.542 0 9 75 0.543 0
the length of time phase changes state longer above shortened make readable , pattern continues on , on. i'm trying calculate mean of pressure , speed each instance phase non-zero. example, in output sample above there 2 lines, 1 average of 3 lines phase 25, , average of 3 lines when phase 75. possible see cases same numeric value of phase shows more once, , need treat each of separately. is, in case phase 0, 0, 25, 25, 25, 0, 0, 0, 25, 25, 0
, need record first group , second group of 25s separate events, other non-zero groups.
what i've tried:
`csv <- read.csv("c:\\test.csv")` `ins <- subset(csv,csv$phase == 25)` `exs <- subset(csv,csv$phase == 75)` `mean(ins$pressure)` `mean(exs$pressure)`
this returns average of entire file when phase 25 , 75, need somehow split groups using trailing , leading 0s. appreciated.
edited: based on feedback asker, seeking aggregations across runs of numbers (i.e. first group of continuous 25s, second group of continuous 25s, , on). because of that, suggest using rle
or run-level encoding function, group number can use in aggregate
command.
i've modified original data contains 2 runs of 25, illustrative purposes, should work regardless. using rle
encoded runs of data, , create group number each row. getting vector of total number of observed lengths, , using rep
function repeat each 1 appropriate length.
after done, can use same basic aggregation command again.
df_example <- data.frame(time = 1:9, phase = c(0,25,25,25,0,0,25,25,0), pressure = c(0.015,0.015,0.234,0.111,0.567,0.876,0.234,0.542,0.543), speed = rep(x = 0,times = 9)) encoded_runs <- rle(x = df_example$phase) df_example$group_no <- rep(x = 1:length(x = encoded_runs$lengths), times = encoded_runs$lengths) aggregate(x = df_example[df_example$phase != 0,c("pressure","speed")], = list(group_no = df_example[df_example$phase != 0,"group_no"], phase = df_example[df_example$phase != 0,"phase"]), fun = mean) group_no phase pressure speed 1 2 25 0.120 0 2 4 25 0.388 0
Comments
Post a Comment