OiO.lk Blog SQL How do I split a string in into an array of strings of equal length in pure sql on Databricks
SQL

How do I split a string in into an array of strings of equal length in pure sql on Databricks


I have a column in a table with strings of variable length:

|value        |
|-------------|
|abcdefgh     |
|1234567891011|

I need to split the strings into arrays of strings, where each string is of length 2 (except for the last string in case of an odd number of characters). Like so:

|value        |split_value                |
|-------------|---------------------------|
|abcdefgh     |[ab, cd, ef, gh, ]         |
|1234567891011|[12, 34, 56, 78, 91, 01, 1]|

This works in pyspark:

# Sample data
data = [("abcdefgh",), ("1234567891011",)]
df = spark.createDataFrame(data, ["value"])
# Register the DataFrame as a temporary view
df.createOrReplaceTempView("strings")
# Use Spark SQL to add a delimiter every 2 characters and then split the string
result = spark.sql("""
SELECT 
    value,
    split(regexp_replace(value, '(.{2})', '$1,'), ',') AS split_value
FROM strings
""")
# Show the result
result.show(truncate=False)

… giving the resulting table above as expected.

However, when I execute the excact same sql statement in an sql cell in a Databricks notebook, I get an array of empty strings:

%sql
SELECT 
    value,
    split(regexp_replace(value, '(.{2})', '$1,'), ',') AS split_value
FROM strings

|value        |split_value                 |
|-------------|----------------------------|
|abcdefgh     |["", "", "", "", ]          |
|1234567891011|["", "", "", "", "", "", ""]|

It also gives me this warning:

How can I achieve the desired result in sql on Databricks?



You need to sign in to view this answers

Exit mobile version