python - Pandas add variable as column and correspond to original table (but they have different length) -


what ask kind of complicated title. made example show question. here sample table:

df = pd.dataframe({'number': [1,2,3,4,5,6,7,8,9], 'col1':['a','b','c','d','e','f','g','h','i']}) 

next step extract df['number'] , run iteration reason. number= [i*i in df['number']] output [1, 4, 9, 16, 25, 36, 49, 64, 81]

now have variable 'number' list.

now key step have regroup list. let's number number less 40

number1 = [i in number if < 40] number2 = [i in number if > 40] 

ok, key step want add number1 , number2 df, expected final output this:

that is, add new column 'type' , these 2 new variables have match index , content 'number1' , 'number2', not '1,4,9...81'.

i think need numpy.where boolean mask:

print (df.number.pow(2) < 40) 0     true 1     true 2     true 3     true 4     true 5     true 6    false 7    false 8    false name: number, dtype: bool  df['type'] = np.where(df.number.pow(2) < 40, 'number1', 'number2') #same #df['type'] = np.where(df.number ** 2 < 40, 'number1', 'number2') #another solution #df['type'] = np.where(df.number.pow(2).lt(40), 'number1', 'number2') print (df)   col1  number     type 0          1  number1 1    b       2  number1 2    c       3  number1 3    d       4  number1 4    e       5  number1 5    f       6  number1 6    g       7  number2 7    h       8  number2 8          9  number2 

timings - numpy.where fastest:

df = pd.dataframe({'number': [1,2,3,4,5,6,7,8,9], 'col1':['a','b','c','d','e','f','g','h','i']}) #[9000 rows x 5 columns] df = pd.concat([df]*1000).reset_index(drop=true)  df['type'] = np.where(df.number.pow(2).lt(40), 'number1', 'number2')  df['type1'] = 'number' + (1 + ((df.number**2)>40).astype(int)).astype(str)  # rule produce new values def f(row):     if row['number']**2 > 40:         val = 'number2'     else:         val = 'number1'     return val  df['type2'] = df.apply(f, axis=1)  #print (df) 

in [218]: %timeit df['type'] = np.where(df.number.pow(2).lt(40), 'number1', 'number2') 1000 loops, best of 3: 1.63 ms per loop  in [219]: %timeit df['type1'] = 'number' + (1 + ((df.number**2)>40).astype(int)).astype(str) 100 loops, best of 3: 13.5 ms per loop  in [220]: %timeit df['type2'] = df.apply(f, axis=1) 10 loops, best of 3: 127 ms per loop 

edit:

i create helper columns better understanding comparing:

df['pow'] = df.number.pow(2) df['comaping val'] = 40 df['val'] = df.number.pow(2).lt(40) print (df)   col1  number  pow  comaping val    val 0          1    1            40   true 1    b       2    4            40   true 2    c       3    9            40   true 3    d       4   16            40   true 4    e       5   25            40   true 5    f       6   36            40   true 6    g       7   49            40  false 7    h       8   64            40  false 8          9   81            40  false 

Comments

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

c# - Selenium Authentication Popup preventing driver close or quit -

tensorflow when input_data MNIST_data , zlib.error: Error -3 while decompressing: invalid block type -