python - Pandas: Pivot a DataFrame, columns to rows -


i have dataframe defined this:

from collections import ordereddict pandas import dataframe import pandas pd import numpy np  table = ordereddict((     ('year', [1900, 1900, 1900, 1900, 1901, 1901, 1901, 1901]),     ('variable',['prcp', 'prcp', 'tavg', 'tavg', 'prcp', 'prcp', 'tavg', 'tavg']),     ('month',  [1, 2, 1, 2, 1, 2, 1, 2]),     ('first_day',  [5, 8, 7, 3, 9, 2, 4, 1]),     ('second_day',  [5, 8, 7, 3, 9, 2, 5, 8]),     ('third_day',  [1, 7, 5, 7, 3, 5, 8, 9]) )) df = dataframe(table) 

the dataframe this:

   year variable  month  first_day  second_day  third_day 0  1900     prcp      1          5           5          1 1  1900     prcp      2          8           8          7 2  1900     tavg      1          7           7          5 3  1900     tavg      2          3           3          7 4  1901     prcp      1          9           9          3 5  1901     prcp      2          2           2          5 6  1901     tavg      1          4           5          8 7  1901     tavg      2          1           8          9 

i want pivot dataframe looks this:

   variable  year  month  day  value 0      prcp  1900      1    1      5 1      prcp  1900      1    2      5 2      prcp  1900      1    3      1 3      prcp  1900      2    1      8 4      prcp  1900      2    2      8 5      prcp  1900      2    3      7 6      prcp  1901      1    1      7 7      prcp  1901      1    2      7 8      prcp  1901      1    3      5 9      prcp  1901      2    1      3 10     prcp  1901      2    2      3 11     prcp  1901      2    3      7 12     tavg  1900      1    1      9 13     tavg  1900      1    2      9 14     tavg  1900      1    3      3 15     tavg  1900      2    1      2 16     tavg  1900      2    2      2 17     tavg  1900      2    3      5 18     tavg  1901      1    1      4 19     tavg  1901      1    2      5 20     tavg  1901      1    3      8 21     tavg  1901      2    1      1 22     tavg  1901      2    2      8 23     tavg  1901      2    3      9  

i think want via pivoting, i've not yet worked out how using pivot() or pivot_table()functions. can suggest way this, or without using pivot? in advance ideas.

you can use melt, first rename columns dict:

d = {'first_day':1,'second_day':2,'third_day':3} df = pd.melt(df.rename(columns=d), id_vars=['variable','year','month'], var_name='day') df = df.sort_values(['variable','year','month', 'day']).reset_index(drop=true) print (df)    variable  year  month day  value 0      prcp  1900      1   1      5 1      prcp  1900      1   2      5 2      prcp  1900      1   3      1 3      prcp  1900      2   1      8 4      prcp  1900      2   2      8 5      prcp  1900      2   3      7 6      prcp  1901      1   1      9 7      prcp  1901      1   2      9 8      prcp  1901      1   3      3 9      prcp  1901      2   1      2 10     prcp  1901      2   2      2 11     prcp  1901      2   3      5 12     tavg  1900      1   1      7 13     tavg  1900      1   2      7 14     tavg  1900      1   3      5 15     tavg  1900      2   1      3 16     tavg  1900      2   2      3 17     tavg  1900      2   3      7 18     tavg  1901      1   1      4 19     tavg  1901      1   2      5 20     tavg  1901      1   3      8 21     tavg  1901      2   1      1 22     tavg  1901      2   2      8 23     tavg  1901      2   3      9 

or map column day dict:

d = {'first_day':1,'second_day':2,'third_day':3} df = pd.melt(df, id_vars=['variable','year','month'], var_name='day') df.day = df.day.map(d) df = df.sort_values(['variable','year','month', 'day']).reset_index(drop=true) print (df)    variable  year  month  day  value 0      prcp  1900      1    1      5 1      prcp  1900      1    2      5 2      prcp  1900      1    3      1 3      prcp  1900      2    1      8 4      prcp  1900      2    2      8 5      prcp  1900      2    3      7 6      prcp  1901      1    1      9 7      prcp  1901      1    2      9 8      prcp  1901      1    3      3 9      prcp  1901      2    1      2 10     prcp  1901      2    2      2 11     prcp  1901      2    3      5 12     tavg  1900      1    1      7 13     tavg  1900      1    2      7 14     tavg  1900      1    3      5 15     tavg  1900      2    1      3 16     tavg  1900      2    2      3 17     tavg  1900      2    3      7 18     tavg  1901      1    1      4 19     tavg  1901      1    2      5 20     tavg  1901      1    3      8 21     tavg  1901      2    1      1 22     tavg  1901      2    2      8 23     tavg  1901      2    3      9 

another solution stack:

d = {'first_day':1,'second_day':2,'third_day':3} df = df.rename(columns=d).set_index(['variable','year','month'])        .stack()        .reset_index(name='value')        .rename(columns={'level_3':'day'}) print (df)    variable  year  month  day  value 0      prcp  1900      1    1      5 1      prcp  1900      1    2      5 2      prcp  1900      1    3      1 3      prcp  1900      2    1      8 4      prcp  1900      2    2      8 5      prcp  1900      2    3      7 6      tavg  1900      1    1      7 7      tavg  1900      1    2      7 8      tavg  1900      1    3      5 9      tavg  1900      2    1      3 10     tavg  1900      2    2      3 11     tavg  1900      2    3      7 12     prcp  1901      1    1      9 13     prcp  1901      1    2      9 14     prcp  1901      1    3      3 15     prcp  1901      2    1      2 16     prcp  1901      2    2      2 17     prcp  1901      2    3      5 18     tavg  1901      1    1      4 19     tavg  1901      1    2      5 20     tavg  1901      1    3      8 21     tavg  1901      2    1      1 22     tavg  1901      2    2      8 23     tavg  1901      2    3      9 

Comments

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

c# - Selenium Authentication Popup preventing driver close or quit -

tensorflow when input_data MNIST_data , zlib.error: Error -3 while decompressing: invalid block type -