OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

How to create table for CloudFront log partitioned by date in Athena?

  • Thread starter Thread starter tom10271
  • Start date Start date
T

tom10271

Guest
Consider CloudFront Logs injected by AWS are stored in S3 with pattern:

s3://aws-cloudfront-log-[AWS Account ID]/[Any prefix I preferred]/E[CloudFront Distribution ID].[Year]-[Month]-[Day]-[Hour].[Hash].gz

Here is an example s3://aws-cloudfront-log-1290287349012/my-app/E12KDDSA1S7.2024-06-26-23.9e4a2b9e.gz



According to official doc, this is the Athena DDL for CloudFront standard log: https://docs.aws.amazon.com/athena/...gs.html#create-cloudfront-table-standard-logs

Code:
CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_standard_logs (
  `date` DATE,
  time STRING,
  x_edge_location STRING,
  sc_bytes BIGINT,
  c_ip STRING,
  cs_method STRING,
  cs_host STRING,
  cs_uri_stem STRING,
  sc_status INT,
  cs_referrer STRING,
  cs_user_agent STRING,
  cs_uri_query STRING,
  cs_cookie STRING,
  x_edge_result_type STRING,
  x_edge_request_id STRING,
  x_host_header STRING,
  cs_protocol STRING,
  cs_bytes BIGINT,
  time_taken FLOAT,
  x_forwarded_for STRING,
  ssl_protocol STRING,
  ssl_cipher STRING,
  x_edge_response_result_type STRING,
  cs_protocol_version STRING,
  fle_status STRING,
  fle_encrypted_fields INT,
  c_port INT,
  time_to_first_byte FLOAT,
  x_edge_detailed_result_type STRING,
  sc_content_type STRING,
  sc_content_len BIGINT,
  sc_range_start BIGINT,
  sc_range_end BIGINT
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t'
LOCATION 's3://DOC-EXAMPLE-BUCKET/'
TBLPROPERTIES ( 'skip.header.line.count'='2' )

I have tried to modify the DDL SQL to the following so that it is partitioned by date

Code:
CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs.editorial (
  -- same set of columns
)
PARTITIONED BY(
    date string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION 's3://aws-cloudfront-log-1290287349012/my-app/'
TBLPROPERTIES (
  'skip.header.line.count'='2',
  'projection.date.format'='yyyy-MM-dd-HH',
  'projection.date.interval'='1',
  'projection.date.interval.unit'='HOURS',
  'projection.date.range'='2021-01-01-00,NOW',
  'projection.date.type'='date',
  'projection.enabled'='true',
  'storage.location.template'='s3://aws-cloudfront-log-1290287349012/my-app/E12KDDSA1S7.${date}'
);

However when I run select * FROM cloudfront_logs.editorial WHERE date_filter = '2024-06-26-23' LIMIT 1;, there is nothing returned.

What is missing or wrong?
<p>Consider CloudFront Logs injected by AWS are stored in S3 with pattern:</p>
<p><code>s3://aws-cloudfront-log-[AWS Account ID]/[Any prefix I preferred]/E[CloudFront Distribution ID].[Year]-[Month]-[Day]-[Hour].[Hash].gz</code></p>
<p>Here is an example
<code>s3://aws-cloudfront-log-1290287349012/my-app/E12KDDSA1S7.2024-06-26-23.9e4a2b9e.gz</code></p>
<hr />
<p>According to official doc, this is the Athena DDL for CloudFront standard log:
<a href="https://docs.aws.amazon.com/athena/...gs.html#create-cloudfront-table-standard-logs" rel="nofollow noreferrer">https://docs.aws.amazon.com/athena/...gs.html#create-cloudfront-table-standard-logs</a></p>
<pre class="lang-sql prettyprint-override"><code>CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_standard_logs (
`date` DATE,
time STRING,
x_edge_location STRING,
sc_bytes BIGINT,
c_ip STRING,
cs_method STRING,
cs_host STRING,
cs_uri_stem STRING,
sc_status INT,
cs_referrer STRING,
cs_user_agent STRING,
cs_uri_query STRING,
cs_cookie STRING,
x_edge_result_type STRING,
x_edge_request_id STRING,
x_host_header STRING,
cs_protocol STRING,
cs_bytes BIGINT,
time_taken FLOAT,
x_forwarded_for STRING,
ssl_protocol STRING,
ssl_cipher STRING,
x_edge_response_result_type STRING,
cs_protocol_version STRING,
fle_status STRING,
fle_encrypted_fields INT,
c_port INT,
time_to_first_byte FLOAT,
x_edge_detailed_result_type STRING,
sc_content_type STRING,
sc_content_len BIGINT,
sc_range_start BIGINT,
sc_range_end BIGINT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION 's3://DOC-EXAMPLE-BUCKET/'
TBLPROPERTIES ( 'skip.header.line.count'='2' )
</code></pre>
<p>I have tried to modify the DDL SQL to the following so that it is partitioned by date</p>
<pre class="lang-sql prettyprint-override"><code>CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs.editorial (
-- same set of columns
)
PARTITIONED BY(
date string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION 's3://aws-cloudfront-log-1290287349012/my-app/'
TBLPROPERTIES (
'skip.header.line.count'='2',
'projection.date.format'='yyyy-MM-dd-HH',
'projection.date.interval'='1',
'projection.date.interval.unit'='HOURS',
'projection.date.range'='2021-01-01-00,NOW',
'projection.date.type'='date',
'projection.enabled'='true',
'storage.location.template'='s3://aws-cloudfront-log-1290287349012/my-app/E12KDDSA1S7.${date}'
);
</code></pre>
<p>However when I run <code>select * FROM cloudfront_logs.editorial WHERE date_filter = '2024-06-26-23' LIMIT 1;</code>, there is nothing returned.</p>
<p>What is missing or wrong?</p>
Continue reading...
 

Online statistics

Members online
0
Guests online
2
Total visitors
2
Top