python - Pandas add variable as column and correspond to original table (but they have different length) -
what ask kind of complicated title. made example show question. here sample table:
df = pd.dataframe({'number': [1,2,3,4,5,6,7,8,9], 'col1':['a','b','c','d','e','f','g','h','i']})
next step extract df['number'] , run iteration reason. number= [i*i in df['number']]
output [1, 4, 9, 16, 25, 36, 49, 64, 81]
now have variable 'number' list.
now key step have regroup list. let's number number less 40
number1 = [i in number if < 40] number2 = [i in number if > 40]
ok, key step want add number1 , number2 df, expected final output this:
that is, add new column 'type' , these 2 new variables have match index , content 'number1' , 'number2', not '1,4,9...81'.
i think need numpy.where
boolean mask
:
print (df.number.pow(2) < 40) 0 true 1 true 2 true 3 true 4 true 5 true 6 false 7 false 8 false name: number, dtype: bool df['type'] = np.where(df.number.pow(2) < 40, 'number1', 'number2') #same #df['type'] = np.where(df.number ** 2 < 40, 'number1', 'number2') #another solution #df['type'] = np.where(df.number.pow(2).lt(40), 'number1', 'number2') print (df) col1 number type 0 1 number1 1 b 2 number1 2 c 3 number1 3 d 4 number1 4 e 5 number1 5 f 6 number1 6 g 7 number2 7 h 8 number2 8 9 number2
timings - numpy.where
fastest:
df = pd.dataframe({'number': [1,2,3,4,5,6,7,8,9], 'col1':['a','b','c','d','e','f','g','h','i']}) #[9000 rows x 5 columns] df = pd.concat([df]*1000).reset_index(drop=true) df['type'] = np.where(df.number.pow(2).lt(40), 'number1', 'number2') df['type1'] = 'number' + (1 + ((df.number**2)>40).astype(int)).astype(str) # rule produce new values def f(row): if row['number']**2 > 40: val = 'number2' else: val = 'number1' return val df['type2'] = df.apply(f, axis=1) #print (df)
in [218]: %timeit df['type'] = np.where(df.number.pow(2).lt(40), 'number1', 'number2') 1000 loops, best of 3: 1.63 ms per loop in [219]: %timeit df['type1'] = 'number' + (1 + ((df.number**2)>40).astype(int)).astype(str) 100 loops, best of 3: 13.5 ms per loop in [220]: %timeit df['type2'] = df.apply(f, axis=1) 10 loops, best of 3: 127 ms per loop
edit:
i create helper columns better understanding comparing:
df['pow'] = df.number.pow(2) df['comaping val'] = 40 df['val'] = df.number.pow(2).lt(40) print (df) col1 number pow comaping val val 0 1 1 40 true 1 b 2 4 40 true 2 c 3 9 40 true 3 d 4 16 40 true 4 e 5 25 40 true 5 f 6 36 40 true 6 g 7 49 40 false 7 h 8 64 40 false 8 9 81 40 false
Comments
Post a Comment