dictionary - Counting occurences from a dict and pandas -
i'm still quite new pandas , python, , count total number of occurrences of same combination of variables across multiple dataframes within single dict.
i have created dict consisting of 6 df. key each df year (1985, 1990, etc.) , consists of index , single row of integers. index made of 2 variables (both strings) , separated comma while integer represents correlation between 2 variables:
do-pspcp pt-wfrto -0.067934 pt-wswfr -0.067903 pt-wtotl -0.060489 pt-wswto -0.060485 do-sspop do-pspcp -0.050703 ps-swpop do-sspcp -0.048588
i know total number of times specific index correlated within entire dict years (key) , individual correlation. ideally, output (integers truncated space considerations):
do-pspcp pt-wfrto 5 1985,1990,1995,2000 -0.06,-0.068,-0.07,-0.06,-0.06 do-pspcp pt-wswfr 2 1985,2000 -0.067,-0.07
the code used generate list uses calls correlation function (get_correlation) using list composed of larger df containing above variables , minimum number of observations required per pair of columns (number) , passes calculations listed above:
for in list: highcorr = (get_correlations(list[i],number)) highcorr[i] = highcorr.to_frame()
you can first convert dict 1 huge dataframe:
df = pd.concat(dictionary)
this should return multi-indexed dataframe, dictionary key the highest level index.
next, can set index original format:
df = df.reset_index().set_index(['string1', 'string2'])
since need information specific index, use:
df.loc['specific_str1', 'specific_str2']
you can need doing various queries:
number_of_times = df.loc['specific_str1', 'specific_str2'].shape[0] # dictionary key called 'level_0' when re-index dates = df.loc['specific_str1', 'specific_str2']['level_0'] corr = df.loc['specific_str1', 'specific_str2']['correlation']
i can't quite figure out output format need, got take here yourself..
Comments
Post a Comment