spark dataframe - converting udf to an inline lambda in pyspark -


i have dataframe consists of 4 columns. let's it's a, b, c , d. want exclude rows column b has value of 'none' or 'nothing'. know how using udf, i'm curious how in lambda anonymous function instead.

my dataframe df, , udf follow:

from pyspark.sql.functions import udf def b_field(b_field_value):    if b_field_value == 'none' or b_field_value == 'nothing':       return true udf_b = udf(b_field, booleantype()) print df.filter(udf_ct(df['b'])).count() 

i'm trying lambda way, , can't work

df.select(df['ct']).filter(lambda x: x == 'none' or x == ''nothing) 

what did wrong?

according spark guide format is:

df.filter(df['age'] > 21).show() 

in case can use 1 of following filter conditions:

df.filter((df['ct'] != 'nothing') && df['ct'] != 'none')) 

where condition (which alias filter):

from pyspark.sql.functions import col   df.where(     (col('ct') != 'nothing') &&     (col('ct') != 'none')) 

or:

df.filter(col("ct").notequal("nothing") && col("ct").notequal("none")) 

Comments

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

c# - Selenium Authentication Popup preventing driver close or quit -

tensorflow when input_data MNIST_data , zlib.error: Error -3 while decompressing: invalid block type -