I'm reading a file where columns can be struct when they have a value else can be string when there is no data. Inline example assigned_to and group are struct and have data.
root
|-- number: string (nullable = true)
|-- assigned_to: struct (nullable = true)
| |-- display_value: string (nullable = true)
| |-- link: string (nullable = true)
|-- group: struct (nullable = true)
| |-- display_value: string (nullable = true)
| |-- link: string (nullable = true)
To flatten the JSON I'm doing the following,
df23 = spark.read.parquet("dbfs:***/test1.parquet")
val_cols4 = []
#the idea is the day when the data type of the columns in struct I dynamically extract values otherwise create new columns and default to None.
for name, cols in df23.dtypes:
if 'struct' in cols:
val_cols4.append(name+".display_value")
else:
df23 = df23.withColumn(name+"_value", lit(None))
Now if I had to use val_cols4 to select from dataframe df23 all the struct columns have the same name "display_value".
root
|-- display_value: string (nullable = true)
|-- display_value: string (nullable = true)
How do I rename the columns to appropriate values? I tried the following,
for name, cols in df23.dtypes:
if 'struct' in cols:
val_cols4.append("col('"+name+".display_value').alias('"+name+"_value')")
else:
df23 = df23.withColumn(name+"_value", lit(None))
This doesn't work and errors out when I do a select on the dataframe.