I’m using Databricks with two different clusters (one from older version and one from new) to run a query that aggregates names, emails, and phone numbers from two data sources using COLLECT_SET. The goal is to combine these values into a single result per location.
However, while the actual data is the same, the order of the results differs between the two clusters, and I am trying to achieve the same output order as in the old cluster (2022). Here’s the query I’m using:
CREATE OR REPLACE TEMP VIEW TEST_COMMUNITY_MANAGER_NEW2 AS
SELECT
Location_Code,
array_JOIN(COLLECT_SET(concat(First_Name, ' ' , Last_Name)) , char(10)) AS COMMUNITY_MANAGER_NAME,
array_JOIN(COLLECT_SET(Work_Email), char(10)) AS Work_Email,
array_JOIN(COLLECT_SET(Work_Phone), char(10)) AS Work_Phone
FROM
(
SELECT
TMID,
Preferred_First_Name AS First_Name,
Last_Name,
Location_Code,
Work_Email,
Work_Phone
FROM
SuccessFactors.HRIS_Data_Daily
WHERE
Job_CATEGORY IN ('CM-Community Management', 'RVMGR-Resort Management')
AND Team_Member_Status IN ('Active', 'Unpaid Leave')
UNION ALL
SELECT
TMID,
Preferred_First_Name AS First_Name,
Last_Name,
PROPERTY_SETUP_EXTID AS Location_Code,
Work_Email,
Work_Phone
FROM
NetSuite.PROPERTY_SETUP PS
LEFT JOIN
MISC.AdditionalDistrictLocations AD ON PS.PROPERTY_SETUP_EXTID = Secondary_Properties
LEFT JOIN
SuccessFactors.HRIS_Data_Daily HR ON HR.Location_Code = AD.Primary_District_Property
AND Job_CATEGORY IN ('CM-Community Management', 'RVMGR-Resort Management')
AND Is_Primary = 'Yes'
AND Team_Member_Status="Active"
WHERE
PROPERTY_SETUP_EXTID NOT IN
(
SELECT DISTINCT
C.NAME
FROM
NetSuite.PROPERTY_SETUP P
JOIN
NetSuite.CLASSES C ON P.CLASSPROPERTY_ID = C.CLASS_ID
JOIN
SuccessFactors.HRIS_Data_Daily H ON
Job_CATEGORY IN ('CM-Community Management', 'RVMGR-Resort Management')
AND Team_Member_Status IN ('Active', 'Unpaid Leave')
AND H.Location_Code = C.NAME
)
) AB
GROUP BY Location_Code
ORDER BY Location_Code;
Problem:
In the old cluster (2022), the result looks like this:
PCB "Kathy Fulton
Christine Hildebrandt" "kfulton@test.com
childebrandt@test.com" "702.604.3
702.509.1"
However, in the new cluster (2024), the result order is different:
PCB "Christine Hildebrandt
Kathy Fulton" "childebrandt@.com
kfulton@.com" "702.604.3
702.509.1"
Desired Outcome:
I want the result order in the new cluster to match that of the old cluster, where "Kathy Fulton" appears before "Christine Hildebrandt."
You need to sign in to view this answers