Dynamically renaming dataframe columns using Pyspark

Question

I'm reading a file where columns can be struct when they have a value else can be string when there is no data. Inline example assigned_to and group are struct and have data.

root
 |-- number: string (nullable = true)
 |-- assigned_to: struct (nullable = true)
 |    |-- display_value: string (nullable = true)
 |    |-- link: string (nullable = true)
 |-- group: struct (nullable = true)
 |    |-- display_value: string (nullable = true)
 |    |-- link: string (nullable = true)

To flatten the JSON I'm doing the following,

df23 = spark.read.parquet("dbfs:***/test1.parquet")
val_cols4 = []

#the idea is the day when the data type of the columns in struct I dynamically extract values otherwise create new columns and default to None.
for name, cols in df23.dtypes:
  if 'struct' in cols:
    val_cols4.append(name+".display_value") 
  else:
    df23 = df23.withColumn(name+"_value", lit(None))

Now if I had to use val_cols4 to select from dataframe df23 all the struct columns have the same name "display_value".

root
 |-- display_value: string (nullable = true)
 |-- display_value: string (nullable = true)

How do I rename the columns to appropriate values? I tried the following,

for name, cols in df23.dtypes:
  if 'struct' in cols:
    val_cols4.append("col('"+name+".display_value').alias('"+name+"_value')") 
  else:
    df23 = df23.withColumn(name+"_value", lit(None))

This doesn't work and errors out when I do a select on the dataframe.

mck · Accepted Answer · 2021-04-26 19:52:13Z

2

You can append an aliased column object rather than a string to val_cols4, e.g.

from pyspark.sql.functions import col, lit

val_cols4 = []

for name, cols in df23.dtypes:
  if 'struct' in cols:
    val_cols4.append(col(name+".display_value").alias(name+"_value")) 
  else:
    df23 = df23.withColumn(name+"_value", lit(None))

Then you can select the columns, e.g.

newdf = df23.select(val_cols4)

answered Apr 26, 2021 at 19:52

mck

42.7k13 gold badges44 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Dynamically renaming dataframe columns using Pyspark

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related