Spark: subset a few columns and remove null rows -

i running spark 2.1 on windows 10, have fetched data mysql spark using jdbc , table looks this

x      y       z ------------------ 1            d1 null   v       ed 5      null    null 7      s       null null   bd      null

i want create new spark dataset x , y columns above table , wan't keep rows not have null in either of 2 columns. resultant table should this

x      y -------- 1      7      s

the following code:

val load_df = spark.read.format("jdbc").option("url", "jdbc:mysql://100.150.200.250:3306").option("dbtable", "schema.table_name").option("user", "uname1").option("password", "pass1").load() val filter_df = load_df.select($"x".isnotnull,$"y".isnotnull).rdd // lets print first 5 values of filter_df filter_df.take(5) res0: array[org.apache.spark.sql.row] = array([true,true], [false,true], [true,false], [true,true], [false,true])

as shown, above result doesn't give me actual values returns boolean values (true when value not null , false when value null)

try this;

val load_df = spark.read.format("jdbc").option("url", "jdbc:mysql://100.150.200.250:3306").option("dbtable", "schema.table_name").option("user", "uname1").option("password", "pass1").load()

now;

load_df.select($"x",$"y").filter("x !== null").filter("y !== null")

Search This Blog

Breniser

Spark: subset a few columns and remove null rows -

Comments

Post a Comment

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

python - Error: Unresolved reference 'selenium' What is the reason? -

asp.net ajax - Jquery scroll to element just goes to top of page -