spark dataframe - converting udf to an inline lambda in pyspark -

i have dataframe consists of 4 columns. let's it's a, b, c , d. want exclude rows column b has value of 'none' or 'nothing'. know how using udf, i'm curious how in lambda anonymous function instead.

my dataframe df, , udf follow:

from pyspark.sql.functions import udf def b_field(b_field_value):    if b_field_value == 'none' or b_field_value == 'nothing':       return true udf_b = udf(b_field, booleantype()) print df.filter(udf_ct(df['b'])).count()

i'm trying lambda way, , can't work

df.select(df['ct']).filter(lambda x: x == 'none' or x == ''nothing)

what did wrong?

according spark guide format is:

df.filter(df['age'] > 21).show()

in case can use 1 of following filter conditions:

df.filter((df['ct'] != 'nothing') && df['ct'] != 'none'))

where condition (which alias filter):

from pyspark.sql.functions import col   df.where(     (col('ct') != 'nothing') &&     (col('ct') != 'none'))

or:

df.filter(col("ct").notequal("nothing") && col("ct").notequal("none"))

Search This Blog

Breniser

spark dataframe - converting udf to an inline lambda in pyspark -

Comments

Post a Comment

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

python - Error: Unresolved reference 'selenium' What is the reason? -

php - Need to store a large amount of data in session with CI 3 but on storing large data in session it is itself destorying automatically -