I have got a numpy array from np.select and I want to store it as a new column in PySpark DataFrame. How can I do that?
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
pdf = pd.DataFrame({'a': [1,2,3], 'b': ['abc', 'cde', 'edf']})
df_data = spark.createDataFrame(pdf, schema='a string, b string')
There are a few conditions and choices for which I use np.select like
np.select(conditions, choices, default='Other')
This returns the following nd-array
[['val1'], ['val2'], ['val3']]
Now I want to save this nd-array as a new column in df_data.