spark dataframe - converting udf to an inline lambda in pyspark -
i have dataframe consists of 4 columns. let's it's a, b, c , d. want exclude rows column b has value of 'none' or 'nothing'. know how using udf, i'm curious how in lambda anonymous function instead.
my dataframe df, , udf follow:
from pyspark.sql.functions import udf def b_field(b_field_value): if b_field_value == 'none' or b_field_value == 'nothing': return true udf_b = udf(b_field, booleantype()) print df.filter(udf_ct(df['b'])).count() i'm trying lambda way, , can't work
df.select(df['ct']).filter(lambda x: x == 'none' or x == ''nothing) what did wrong?
according spark guide format is:
df.filter(df['age'] > 21).show() in case can use 1 of following filter conditions:
df.filter((df['ct'] != 'nothing') && df['ct'] != 'none')) where condition (which alias filter):
from pyspark.sql.functions import col df.where( (col('ct') != 'nothing') && (col('ct') != 'none')) or:
df.filter(col("ct").notequal("nothing") && col("ct").notequal("none"))
Comments
Post a Comment