OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Pyspark Raw object not get updated

  • Thread starter Thread starter akalanka
  • Start date Start date
A

akalanka

Guest
Code:
@udf(allPageContentSchemaOutput)
def modifyContent(input_data: List[Row]) -> List[Row]:
    modified_data = []
    for row in input_data:
        item_dict = row.asDict()  # Convert Row to dictionary
        components_list = item_dict.get('components', [])
        for component in components_list:
            component = component.asDict()
            if isinstance(component, dict) and 'locale' in component:
                component['locale'] = {
                    'es-us': 'Test',  # Replace with your logic for es-us
                    'en-us': 'Test2'   # Replace with your logic for en-us
                }
        modified_data.append(Row(**item_dict))
    return modified_data

What I am trying to do is update local attribute.My input data looks lke this

Code:
[{"uid":"blt0bbd250f8c97ce90","components":[{"locale":"es-us"}]},{"uid":"blt0bbd250f8c97ce91","components":[{"locale":"es-us"}]}]

This is my output schema format

Code:
output_schema = StructType([
    StructField("uid", StringType(), True),
    StructField("components", ArrayType(StructType([
        StructField("locale", MapType(StringType(), StringType()), True)
    ])), True)
])

# Define the schema for the output data
allPageContentSchemaOutput = ArrayType(output_schema)

In the ouput schema "locale" : null. Seems like input_data no get modified. Can you help on this how to restructure modifyContent method to modify input_data and refelect in the allPageContentSchemaOutput.
<pre><code>@udf(allPageContentSchemaOutput)
def modifyContent(input_data: List[Row]) -> List[Row]:
modified_data = []
for row in input_data:
item_dict = row.asDict() # Convert Row to dictionary
components_list = item_dict.get('components', [])
for component in components_list:
component = component.asDict()
if isinstance(component, dict) and 'locale' in component:
component['locale'] = {
'es-us': 'Test', # Replace with your logic for es-us
'en-us': 'Test2' # Replace with your logic for en-us
}
modified_data.append(Row(**item_dict))
return modified_data
</code></pre>
<p>What I am trying to do is update local attribute.My input data looks lke this</p>
<pre><code>[{"uid":"blt0bbd250f8c97ce90","components":[{"locale":"es-us"}]},{"uid":"blt0bbd250f8c97ce91","components":[{"locale":"es-us"}]}]
</code></pre>
<p>This is my output schema format</p>
<pre><code>output_schema = StructType([
StructField("uid", StringType(), True),
StructField("components", ArrayType(StructType([
StructField("locale", MapType(StringType(), StringType()), True)
])), True)
])

# Define the schema for the output data
allPageContentSchemaOutput = ArrayType(output_schema)
</code></pre>
<p>In the ouput schema "locale" : null. Seems like input_data no get modified. Can you help on this how to restructure modifyContent method to modify input_data and refelect in the allPageContentSchemaOutput.</p>
 

Latest posts

Top