amazon s3 - How to improve query performance to s3 data from Athena -


i have partitioned data stored in s3 in hive format this.

bucket/year=2017/month=3/date=1/filename.json bucket/year=2017/month=3/date=2/filename1.json bucket/year=2017/month=3/date=3/filename2.json 

every partition has around 1,000,000 records. have created table , partitions in athena this.

now running query athena

select count(*) mts_data_1 year='2017' , month='3' , date='1' 

this query taking 1800 seconds scan 1,000,000 records.

so question how can improve query performance?

i think problem athena has read many files s3. 250 mb isn't data, 1,000,000 files lot of files. athena query performance improve dramatically if reduce number of files, , compressing aggregated files more. how many files need 1 day's partition? one-minute resolution, need less 1,500 files. if current query time ~30 minutes, might start lot less.

there many options aggregating , compressing records:


Comments

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

c# - Selenium Authentication Popup preventing driver close or quit -

tensorflow when input_data MNIST_data , zlib.error: Error -3 while decompressing: invalid block type -