Spark: subset a few columns and remove null rows -


i running spark 2.1 on windows 10, have fetched data mysql spark using jdbc , table looks this

x      y       z ------------------ 1            d1 null   v       ed 5      null    null 7      s       null null   bd      null 

i want create new spark dataset x , y columns above table , wan't keep rows not have null in either of 2 columns. resultant table should this

x      y -------- 1      7      s 

the following code:

val load_df = spark.read.format("jdbc").option("url", "jdbc:mysql://100.150.200.250:3306").option("dbtable", "schema.table_name").option("user", "uname1").option("password", "pass1").load() val filter_df = load_df.select($"x".isnotnull,$"y".isnotnull).rdd // lets print first 5 values of filter_df filter_df.take(5) res0: array[org.apache.spark.sql.row] = array([true,true], [false,true], [true,false], [true,true], [false,true]) 

as shown, above result doesn't give me actual values returns boolean values (true when value not null , false when value null)

try this;

val load_df = spark.read.format("jdbc").option("url", "jdbc:mysql://100.150.200.250:3306").option("dbtable", "schema.table_name").option("user", "uname1").option("password", "pass1").load() 

now;

load_df.select($"x",$"y").filter("x !== null").filter("y !== null") 

Comments

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

c# - Selenium Authentication Popup preventing driver close or quit -

tensorflow when input_data MNIST_data , zlib.error: Error -3 while decompressing: invalid block type -