OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Select with large json

  • Thread starter Thread starter Oscar Hernández
  • Start date Start date
O

Oscar Hernández

Guest
I am testing an etl in python, when I retrieve a json array from one table to then map it to another, originally it is more than 60,000 characters but it returns one of more than 5,000, so when I insert it mapped into the other table it skips records. I think the library (psycopg2) or the database server (postgre) cannot process such large strings. What solution can there be? The query is this:

Code:
SELECT json as new FROM {config.JDATA_TABLE}
    ORDER by timeinstant DESC

Para insertarlos es esta:

Code:
INSERT INTO {config.AVERAGES_EST_MAGN_TABLE} (
                stationcode, year, month, day, hour, timeinstant, magnitude, valuecams, stationid, magnitudeid, hourutc, hourutcsys, entityid)

Code:
SELECT DISTINCT LPAD(D.stationcode, 8,'0'), D.year::float8, D.month::float8, D.day::float8, D.hour::float8, D.timeinstant, D.magnitude, D.valuecams::float8, D.shortcodestation, D.magnitudeid, D.hourutc, D.hourutcsys, D.entityid
                FROM (
                    SELECT DISTINCT stationcode,year, month, day, hour, 
                        to_timestamp(day || '/' || month || '/' || year || ' ' || hour, 'DD/MM/YYYY HH24:MI') timeinstant, 
                        magnitude, coalesce(valuecams::float8, 0) valuecams, 
                        E.shortcodestation, CASE magnitude
                            WHEN 'NO2' THEN '008'
                            WHEN 'PM2.5' THEN '009'
                            WHEN 'PM10' THEN '010'
                            WHEN 'O3' THEN '014'
                            ELSE NULL
                        END AS magnitudeid
                        ,
                        TO_TIMESTAMP(TO_CHAR(TO_TIMESTAMP(day || '/' || month || '/' || year || ' ' || hour, 'DD/MM/YYYY HH24:MI:SS') AT TIME ZONE 'UTC', 'YYYY-MM-DD HH24:MI:SS'), 'YYYY-MM-DD HH24:MI:SS') AS hourutc,
                        TO_TIMESTAMP(TO_CHAR(TO_TIMESTAMP(day || '/' || month || '/' || year || ' ' || hour, 'DD/MM/YYYY HH24:MI:SS') AT TIME ZONE 'UTC', 'YYYY-MM-DD HH24:MI:SS'), 'YYYY-MM-DD HH24:MI:SS') AS hourutcsys,
                        stationcode || magnitude || year || month ||day || hour as entityid
                    FROM (
                        SELECT t.*
                        FROM (SELECT DISTINCT
                            data->>'cod_estacion' AS stationcode,
                            data->>'anno' AS year,
                            data->>'mes' AS month,
                            data->>'dia' AS day,
                            data->>'hora' AS hour,
                            data->>'magnitud' AS magnitude,
                            data->>'valor_cams' AS valuecams
                        FROM jsonb_array_elements('{str(inserted_data).replace("'",'"')}'::jsonb) AS data) t
                    ) DE
                    LEFT JOIN {config.STATION_TABLE} E
                    ON LPAD(DE.stationcode, 8,'0') = E.europeancode
                ) D
                WHERE NOT EXISTS (
                    SELECT 1
                    FROM (
                        SELECT DISTINCT stationcode, year, month, day, hour, magnitude, valuecams
                        FROM (
                            SELECT DISTINCT
                                data->>'cod_estacion' AS stationcode,
                                data->>'anno' AS year,
                                data->>'mes' AS month,
                                data->>'dia' AS day,
                                data->>'hora' AS hour,
                                data->>'magnitud' AS magnitude,
                                data->>'valor_cams' AS valuecams
                            FROM jsonb_array_elements('{str(inserted_data).replace("'",'"')}'::jsonb) AS data
                            )t
                    ) D, {config.AVERAGES_EST_MAGN_TABLE} PDB
                    WHERE PDB.stationcode = LPAD(D.stationcode, 8,'0') 
                    AND PDB.year = D.year::float8 
                    AND PDB.month = D.month::float8 
                    AND PDB.day = D.day::float8 
                    AND PDB.hour = D.hour::float8 
                    AND PDB.magnitude = D.magnitude
                );

The structure of the tables to consult is that of the image

Python code:

Code:
    new_json_data = read_DB(f"""
        SELECT json as new FROM {config.JDATA_TABLE}
        ORDER by timeinstant DESC 
        """)  # Este valor debe ser reemplazado por los datos reales
    
        for new in new_json_data:
            inserted_data = new[0]
            # Procesar los datos y realizar inserciones o actualizaciones en PDE_MEdayS_EST_MAGN_CAMS        
            update_query = f"""
                    UPDATE {config.AVERAGES_EST_MAGN_TABLE} PDB
                    SET valuecams = (
                        SELECT DISTINCT coalesce(D.valuecams::float8, 0)
                        FROM (
                            SELECT DISTINCT stationcode, year, month, day, hour, magnitude, valuecams
                            FROM (
                                SELECT DISTINCT
                                    data->>'cod_estacion' AS stationcode,
                                    data->>'anno' AS year,
                                    data->>'mes' AS month,
                                    data->>'dia' AS day,
                                    data->>'hora' AS hour,
                                    data->>'magnitud' AS magnitude,
                                    data->>'valor_cams' AS valuecams
                                FROM jsonb_array_elements('{str(inserted_data).replace("'",'"')}'::jsonb) AS data
                            )t
                        ) D
                        WHERE PDB.stationcode = LPAD(D.stationcode, 8,'0')
                        AND PDB.year = D.year::float8 
                        AND PDB.month = D.month::float8 
                        AND PDB.day = D.day::float8 
                        AND PDB.hour = D.hour::float8 
                        AND PDB.magnitude = D.magnitude
                    )
                    WHERE EXISTS (
                        SELECT 1
                        FROM (
                            SELECT DISTINCT stationcode, year, month, day, hour, magnitude, valuecams
                            FROM (
                                SELECT DISTINCT
                                    data->>'cod_estacion' AS stationcode,
                                    data->>'anno' AS year,
                                    data->>'mes' AS month,
                                    data->>'dia' AS day,
                                    data->>'hora' AS hour,
                                    data->>'magnitud' AS magnitude,
                                    data->>'valor_cams' AS valuecams
                                FROM jsonb_array_elements('{str(inserted_data).replace("'",'"')}'::jsonb) AS data
                                )t
                        ) D
                        WHERE PDB.stationcode = LPAD(D.stationcode, 8,'0') 
                        AND PDB.year = D.year::float8 
                        AND PDB.month = D.month::float8 
                        AND PDB.day = D.day::float8 
                        AND PDB.hour = D.hour::float8 
                        AND PDB.magnitude = D.magnitude
                    )
                """
            read_DB(update_query)
            read_DB(f"""INSERT INTO {config.AVERAGES_EST_MAGN_TABLE} (
                    stationcode, year, month, day, hour, timeinstant, magnitude, valuecams, stationid, magnitudeid, hourutc, hourutcsys, entityid)
                    SELECT DISTINCT LPAD(D.stationcode, 8,'0'), D.year::float8, D.month::float8, D.day::float8, D.hour::float8, D.timeinstant, D.magnitude, D.valuecams::float8, D.shortcodestation, D.magnitudeid, D.hourutc, D.hourutcsys, D.entityid
                    FROM (
                        SELECT DISTINCT stationcode,year, month, day, hour, 
                            to_timestamp(day || '/' || month || '/' || year || ' ' || hour, 'DD/MM/YYYY HH24:MI') timeinstant, 
                            magnitude, coalesce(valuecams::float8, 0) valuecams, 
                            E.shortcodestation, CASE magnitude
                                WHEN 'NO2' THEN '008'
                                WHEN 'PM2.5' THEN '009'
                                WHEN 'PM10' THEN '010'
                                WHEN 'O3' THEN '014'
                                ELSE NULL
                            END AS magnitudeid
                            ,
                            TO_TIMESTAMP(TO_CHAR(TO_TIMESTAMP(day || '/' || month || '/' || year || ' ' || hour, 'DD/MM/YYYY HH24:MI:SS') AT TIME ZONE 'UTC', 'YYYY-MM-DD HH24:MI:SS'), 'YYYY-MM-DD HH24:MI:SS') AS hourutc,
                            TO_TIMESTAMP(TO_CHAR(TO_TIMESTAMP(day || '/' || month || '/' || year || ' ' || hour, 'DD/MM/YYYY HH24:MI:SS') AT TIME ZONE 'UTC', 'YYYY-MM-DD HH24:MI:SS'), 'YYYY-MM-DD HH24:MI:SS') AS hourutcsys,
                            stationcode || magnitude || year || month ||day || hour as entityid
                        FROM (
                            SELECT t.*
                            FROM (SELECT DISTINCT
                                data->>'cod_estacion' AS stationcode,
                                data->>'anno' AS year,
                                data->>'mes' AS month,
                                data->>'dia' AS day,
                                data->>'hora' AS hour,
                                data->>'magnitud' AS magnitude,
                                data->>'valor_cams' AS valuecams
                            FROM jsonb_array_elements('{str(inserted_data).replace("'",'"')}'::jsonb) AS data) t
                        ) DE
                        LEFT JOIN {config.STATION_TABLE} E
                        ON LPAD(DE.stationcode, 8,'0') = E.europeancode
                    ) D
                    WHERE NOT EXISTS (
                        SELECT 1
                        FROM (
                            SELECT DISTINCT stationcode, year, month, day, hour, magnitude, valuecams
                            FROM (
                                SELECT DISTINCT
                                    data->>'cod_estacion' AS stationcode,
                                    data->>'anno' AS year,
                                    data->>'mes' AS month,
                                    data->>'dia' AS day,
                                    data->>'hora' AS hour,
                                    data->>'magnitud' AS magnitude,
                                    data->>'valor_cams' AS valuecams
                                FROM jsonb_array_elements('{str(inserted_data).replace("'",'"')}'::jsonb) AS data
                                )t
                        ) D, {config.AVERAGES_EST_MAGN_TABLE} PDB
                        WHERE PDB.stationcode = LPAD(D.stationcode, 8,'0') 
                        AND PDB.year = D.year::float8 
                        AND PDB.month = D.month::float8 
                        AND PDB.day = D.day::float8 
                        AND PDB.hour = D.hour::float8 
                        AND PDB.magnitude = D.magnitude
                    );
                    """)´´´


I want to retrieve the entire json and then map it to the other table
<p>I am testing an etl in python, when I retrieve a json array from one table to then map it to another, originally it is more than 60,000 characters but it returns one of more than 5,000, so when I insert it mapped into the other table it skips records. I think the library (psycopg2) or the database server (postgre) cannot process such large strings. What solution can there be?
The query is this:</p>
<pre><code>SELECT json as new FROM {config.JDATA_TABLE}
ORDER by timeinstant DESC
</code></pre>
<p>Para insertarlos es esta:</p>
<pre><code>INSERT INTO {config.AVERAGES_EST_MAGN_TABLE} (
stationcode, year, month, day, hour, timeinstant, magnitude, valuecams, stationid, magnitudeid, hourutc, hourutcsys, entityid)

</code></pre>
<pre><code>SELECT DISTINCT LPAD(D.stationcode, 8,'0'), D.year::float8, D.month::float8, D.day::float8, D.hour::float8, D.timeinstant, D.magnitude, D.valuecams::float8, D.shortcodestation, D.magnitudeid, D.hourutc, D.hourutcsys, D.entityid
FROM (
SELECT DISTINCT stationcode,year, month, day, hour,
to_timestamp(day || '/' || month || '/' || year || ' ' || hour, 'DD/MM/YYYY HH24:MI') timeinstant,
magnitude, coalesce(valuecams::float8, 0) valuecams,
E.shortcodestation, CASE magnitude
WHEN 'NO2' THEN '008'
WHEN 'PM2.5' THEN '009'
WHEN 'PM10' THEN '010'
WHEN 'O3' THEN '014'
ELSE NULL
END AS magnitudeid
,
TO_TIMESTAMP(TO_CHAR(TO_TIMESTAMP(day || '/' || month || '/' || year || ' ' || hour, 'DD/MM/YYYY HH24:MI:SS') AT TIME ZONE 'UTC', 'YYYY-MM-DD HH24:MI:SS'), 'YYYY-MM-DD HH24:MI:SS') AS hourutc,
TO_TIMESTAMP(TO_CHAR(TO_TIMESTAMP(day || '/' || month || '/' || year || ' ' || hour, 'DD/MM/YYYY HH24:MI:SS') AT TIME ZONE 'UTC', 'YYYY-MM-DD HH24:MI:SS'), 'YYYY-MM-DD HH24:MI:SS') AS hourutcsys,
stationcode || magnitude || year || month ||day || hour as entityid
FROM (
SELECT t.*
FROM (SELECT DISTINCT
data->>'cod_estacion' AS stationcode,
data->>'anno' AS year,
data->>'mes' AS month,
data->>'dia' AS day,
data->>'hora' AS hour,
data->>'magnitud' AS magnitude,
data->>'valor_cams' AS valuecams
FROM jsonb_array_elements('{str(inserted_data).replace("'",'"')}'::jsonb) AS data) t
) DE
LEFT JOIN {config.STATION_TABLE} E
ON LPAD(DE.stationcode, 8,'0') = E.europeancode
) D
WHERE NOT EXISTS (
SELECT 1
FROM (
SELECT DISTINCT stationcode, year, month, day, hour, magnitude, valuecams
FROM (
SELECT DISTINCT
data->>'cod_estacion' AS stationcode,
data->>'anno' AS year,
data->>'mes' AS month,
data->>'dia' AS day,
data->>'hora' AS hour,
data->>'magnitud' AS magnitude,
data->>'valor_cams' AS valuecams
FROM jsonb_array_elements('{str(inserted_data).replace("'",'"')}'::jsonb) AS data
)t
) D, {config.AVERAGES_EST_MAGN_TABLE} PDB
WHERE PDB.stationcode = LPAD(D.stationcode, 8,'0')
AND PDB.year = D.year::float8
AND PDB.month = D.month::float8
AND PDB.day = D.day::float8
AND PDB.hour = D.hour::float8
AND PDB.magnitude = D.magnitude
);
</code></pre>
<p>The structure of the tables to consult is that of the image</p>
<p>Python code:</p>
<pre><code> new_json_data = read_DB(f"""
SELECT json as new FROM {config.JDATA_TABLE}
ORDER by timeinstant DESC
""") # Este valor debe ser reemplazado por los datos reales

for new in new_json_data:
inserted_data = new[0]
# Procesar los datos y realizar inserciones o actualizaciones en PDE_MEdayS_EST_MAGN_CAMS
update_query = f"""
UPDATE {config.AVERAGES_EST_MAGN_TABLE} PDB
SET valuecams = (
SELECT DISTINCT coalesce(D.valuecams::float8, 0)
FROM (
SELECT DISTINCT stationcode, year, month, day, hour, magnitude, valuecams
FROM (
SELECT DISTINCT
data->>'cod_estacion' AS stationcode,
data->>'anno' AS year,
data->>'mes' AS month,
data->>'dia' AS day,
data->>'hora' AS hour,
data->>'magnitud' AS magnitude,
data->>'valor_cams' AS valuecams
FROM jsonb_array_elements('{str(inserted_data).replace("'",'"')}'::jsonb) AS data
)t
) D
WHERE PDB.stationcode = LPAD(D.stationcode, 8,'0')
AND PDB.year = D.year::float8
AND PDB.month = D.month::float8
AND PDB.day = D.day::float8
AND PDB.hour = D.hour::float8
AND PDB.magnitude = D.magnitude
)
WHERE EXISTS (
SELECT 1
FROM (
SELECT DISTINCT stationcode, year, month, day, hour, magnitude, valuecams
FROM (
SELECT DISTINCT
data->>'cod_estacion' AS stationcode,
data->>'anno' AS year,
data->>'mes' AS month,
data->>'dia' AS day,
data->>'hora' AS hour,
data->>'magnitud' AS magnitude,
data->>'valor_cams' AS valuecams
FROM jsonb_array_elements('{str(inserted_data).replace("'",'"')}'::jsonb) AS data
)t
) D
WHERE PDB.stationcode = LPAD(D.stationcode, 8,'0')
AND PDB.year = D.year::float8
AND PDB.month = D.month::float8
AND PDB.day = D.day::float8
AND PDB.hour = D.hour::float8
AND PDB.magnitude = D.magnitude
)
"""
read_DB(update_query)
read_DB(f"""INSERT INTO {config.AVERAGES_EST_MAGN_TABLE} (
stationcode, year, month, day, hour, timeinstant, magnitude, valuecams, stationid, magnitudeid, hourutc, hourutcsys, entityid)
SELECT DISTINCT LPAD(D.stationcode, 8,'0'), D.year::float8, D.month::float8, D.day::float8, D.hour::float8, D.timeinstant, D.magnitude, D.valuecams::float8, D.shortcodestation, D.magnitudeid, D.hourutc, D.hourutcsys, D.entityid
FROM (
SELECT DISTINCT stationcode,year, month, day, hour,
to_timestamp(day || '/' || month || '/' || year || ' ' || hour, 'DD/MM/YYYY HH24:MI') timeinstant,
magnitude, coalesce(valuecams::float8, 0) valuecams,
E.shortcodestation, CASE magnitude
WHEN 'NO2' THEN '008'
WHEN 'PM2.5' THEN '009'
WHEN 'PM10' THEN '010'
WHEN 'O3' THEN '014'
ELSE NULL
END AS magnitudeid
,
TO_TIMESTAMP(TO_CHAR(TO_TIMESTAMP(day || '/' || month || '/' || year || ' ' || hour, 'DD/MM/YYYY HH24:MI:SS') AT TIME ZONE 'UTC', 'YYYY-MM-DD HH24:MI:SS'), 'YYYY-MM-DD HH24:MI:SS') AS hourutc,
TO_TIMESTAMP(TO_CHAR(TO_TIMESTAMP(day || '/' || month || '/' || year || ' ' || hour, 'DD/MM/YYYY HH24:MI:SS') AT TIME ZONE 'UTC', 'YYYY-MM-DD HH24:MI:SS'), 'YYYY-MM-DD HH24:MI:SS') AS hourutcsys,
stationcode || magnitude || year || month ||day || hour as entityid
FROM (
SELECT t.*
FROM (SELECT DISTINCT
data->>'cod_estacion' AS stationcode,
data->>'anno' AS year,
data->>'mes' AS month,
data->>'dia' AS day,
data->>'hora' AS hour,
data->>'magnitud' AS magnitude,
data->>'valor_cams' AS valuecams
FROM jsonb_array_elements('{str(inserted_data).replace("'",'"')}'::jsonb) AS data) t
) DE
LEFT JOIN {config.STATION_TABLE} E
ON LPAD(DE.stationcode, 8,'0') = E.europeancode
) D
WHERE NOT EXISTS (
SELECT 1
FROM (
SELECT DISTINCT stationcode, year, month, day, hour, magnitude, valuecams
FROM (
SELECT DISTINCT
data->>'cod_estacion' AS stationcode,
data->>'anno' AS year,
data->>'mes' AS month,
data->>'dia' AS day,
data->>'hora' AS hour,
data->>'magnitud' AS magnitude,
data->>'valor_cams' AS valuecams
FROM jsonb_array_elements('{str(inserted_data).replace("'",'"')}'::jsonb) AS data
)t
) D, {config.AVERAGES_EST_MAGN_TABLE} PDB
WHERE PDB.stationcode = LPAD(D.stationcode, 8,'0')
AND PDB.year = D.year::float8
AND PDB.month = D.month::float8
AND PDB.day = D.day::float8
AND PDB.hour = D.hour::float8
AND PDB.magnitude = D.magnitude
);
""")´´´


I want to retrieve the entire json and then map it to the other table
</code></pre>
 

Latest posts

Top