OiO.lk Blog SQL Inconsistent Order of Results with COLLECT_SET Between Two Databricks Clusters old and new
SQL

Inconsistent Order of Results with COLLECT_SET Between Two Databricks Clusters old and new


I’m using Databricks with two different clusters (one from older version and one from new) to run a query that aggregates names, emails, and phone numbers from two data sources using COLLECT_SET. The goal is to combine these values into a single result per location.

However, while the actual data is the same, the order of the results differs between the two clusters, and I am trying to achieve the same output order as in the old cluster (2022). Here’s the query I’m using:

CREATE OR REPLACE TEMP VIEW TEST_COMMUNITY_MANAGER_NEW2 AS

SELECT 
    Location_Code, 
    array_JOIN(COLLECT_SET(concat(First_Name, ' ' , Last_Name)) , char(10)) AS COMMUNITY_MANAGER_NAME, 
    array_JOIN(COLLECT_SET(Work_Email), char(10)) AS Work_Email, 
    array_JOIN(COLLECT_SET(Work_Phone), char(10)) AS Work_Phone 
FROM 
(
    SELECT 
        TMID,
        Preferred_First_Name AS First_Name,
        Last_Name,
        Location_Code,
        Work_Email,
        Work_Phone
    FROM  
        SuccessFactors.HRIS_Data_Daily
    WHERE 
        Job_CATEGORY IN ('CM-Community Management', 'RVMGR-Resort Management')
        AND Team_Member_Status IN ('Active', 'Unpaid Leave')
    
    UNION ALL
    
    SELECT  
        TMID,
        Preferred_First_Name AS First_Name,
        Last_Name,
        PROPERTY_SETUP_EXTID AS Location_Code,
        Work_Email,
        Work_Phone 
    FROM  
        NetSuite.PROPERTY_SETUP PS 
    LEFT JOIN 
        MISC.AdditionalDistrictLocations AD ON PS.PROPERTY_SETUP_EXTID = Secondary_Properties 
    LEFT JOIN 
        SuccessFactors.HRIS_Data_Daily HR ON HR.Location_Code = AD.Primary_District_Property 
        AND Job_CATEGORY IN ('CM-Community Management', 'RVMGR-Resort Management') 
        AND Is_Primary = 'Yes' 
        AND Team_Member_Status="Active"
    WHERE  
        PROPERTY_SETUP_EXTID NOT IN 
        (
            SELECT DISTINCT 
                C.NAME 
            FROM 
                NetSuite.PROPERTY_SETUP P
            JOIN 
                NetSuite.CLASSES C ON P.CLASSPROPERTY_ID = C.CLASS_ID
            JOIN 
                SuccessFactors.HRIS_Data_Daily H ON 
                Job_CATEGORY IN ('CM-Community Management', 'RVMGR-Resort Management') 
                AND Team_Member_Status IN ('Active', 'Unpaid Leave') 
                AND H.Location_Code = C.NAME
        )
) AB
GROUP BY Location_Code
ORDER BY Location_Code;

Problem:
In the old cluster (2022), the result looks like this:

PCB  "Kathy Fulton
Christine Hildebrandt"  "kfulton@test.com
childebrandt@test.com"  "702.604.3 
702.509.1"

However, in the new cluster (2024), the result order is different:

PCB  "Christine Hildebrandt
Kathy Fulton"  "childebrandt@.com
kfulton@.com"  "702.604.3 
702.509.1"

Desired Outcome:
I want the result order in the new cluster to match that of the old cluster, where "Kathy Fulton" appears before "Christine Hildebrandt."



You need to sign in to view this answers

Exit mobile version