spark dataframe - converting udf to an inline lambda in pyspark -
i have dataframe consists of 4 columns. let's it's a, b, c , d. want exclude rows column b has value of 'none' or 'nothing'. know how using udf, i'm curious how in lambda anonymous function instead.
my dataframe df, , udf follow:
from pyspark.sql.functions import udf def b_field(b_field_value): if b_field_value == 'none' or b_field_value == 'nothing': return true udf_b = udf(b_field, booleantype()) print df.filter(udf_ct(df['b'])).count()
i'm trying lambda way, , can't work
df.select(df['ct']).filter(lambda x: x == 'none' or x == ''nothing)
what did wrong?
according spark guide format is:
df.filter(df['age'] > 21).show()
in case can use 1 of following filter conditions:
df.filter((df['ct'] != 'nothing') && df['ct'] != 'none'))
where
condition (which alias filter
):
from pyspark.sql.functions import col df.where( (col('ct') != 'nothing') && (col('ct') != 'none'))
or:
df.filter(col("ct").notequal("nothing") && col("ct").notequal("none"))
Comments
Post a Comment