OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Create Partitions in External Tables in Azure Synapse SQL Database (Serverless)

  • Thread starter Thread starter Syed
  • Start date Start date
S

Syed

Guest
I want to create an external table with partition columns based on year, month and day. I want to be able to query through this data in an optimized way. The queries could involve using a range of days, months or years. Therefore, I want to be able to use 'BETWEEN' in my query. This would help me avoid scanning the entire datalake and reduce the cost and time.

I used similar structure in AWS using Athena; however, I am having lots of trouble creating a table with partitions in Azure Synapse Analytics. I would really appreciate it if you could guide me with this problem.

Sample Query could be as follows:

Code:
SELECT * FROM my_partitioned_table
WHERE year BETWEEN 2023 AND 2024
AND month BETWEEN 1 AND 2
AND day BETWEEN 5 AND 6

Query to create an external table:

Code:
CREATE EXTERNAL TABLE [dbo].[my_partitioned_table]
(
[UTCDate] DATE,
[UTCHour] INT,
[LocalDate] DATE,
[LocalHour] INT,
[value] FLOAT
)
WITH (
LOCATION = '/path/year=*/month=*/day=*/*.parquet',
DATA_SOURCE = [datasource],
FILE_FORMAT = [fileformat]
)
PARTITION BY ( [year] INT, [month] INT, [day] INT );

The above query is wrong and I do not understand how to add partitions. I also made sure my files are stored in the format /path/year=/month=/day=/.parquet.
<p>I want to create an external table with partition columns based on year, month and day. I want to be able to query through this data in an optimized way. The queries could involve using a range of days, months or years. Therefore, I want to be able to use 'BETWEEN' in my query. This would help me avoid scanning the entire datalake and reduce the cost and time.</p>
<p>I used similar structure in AWS using Athena; however, I am having lots of trouble creating a table with partitions in Azure Synapse Analytics. I would really appreciate it if you could guide me with this problem.</p>
<p>Sample Query could be as follows:</p>
<pre><code>SELECT * FROM my_partitioned_table
WHERE year BETWEEN 2023 AND 2024
AND month BETWEEN 1 AND 2
AND day BETWEEN 5 AND 6
</code></pre>
<p>Query to create an external table:</p>
<pre><code>CREATE EXTERNAL TABLE [dbo].[my_partitioned_table]
(
[UTCDate] DATE,
[UTCHour] INT,
[LocalDate] DATE,
[LocalHour] INT,
[value] FLOAT
)
WITH (
LOCATION = '/path/year=*/month=*/day=*/*.parquet',
DATA_SOURCE = [datasource],
FILE_FORMAT = [fileformat]
)
PARTITION BY ( [year] INT, [month] INT, [day] INT );
</code></pre>
<p>The above query is wrong and I do not understand how to add partitions. I also made sure my files are stored in the format /path/year=<em>/month=</em>/day=<em>/</em>.parquet.</p>
Continue reading...
 

Latest posts

Top