How to count across DStreams in Apache Spark? -
i have following question.
imagine there dstream of json strings coming in , apply few different filters in parallel on same dstream (so these filters not applied 1 after other). example here pseudo code if helps
dstream.filter(x -> { check set of keys }) -> filteredstream1 dstream.filter(x -> { check set of keys}) -> filteredstream2
but cannot dstream.filter(x -> { check set of keys }).filter(x -> { check set of keys})
now want able count number of elements in filteredstream1 , filteredstream2 , combine result 1 message follows
{"filteredstream1" : 50, "filteredstream2": 25}
any easy way such leveraging rdd.count across streams or should use maptopair , reducebykey?
Comments
Post a Comment