I am trying the following code which replace an empty list with unique array of a column("apples_set") when the condition "all" is satisfied.
The column "apple_logic_string" is of type Array[String]
Data frame looks like this:
apples_patterns.show()
+--------------------+-----------------+
| apples_logic_string|apples_set |
+--------------------+-----------------+
| "234" |["43","54"] |
| "65" |["95"] |
| "all" |[] |
| "76" |["84","67"] |
+--------------------+-----------------+
The code:
unique_all_apples = set(apples_patterns.agg(F.flatten(F.collect_set('apples_set'))).head()[0]) # noqa
error_patterns = apples_patterns.withColumn('apples_set', F.when(F.col('apples_logic_string') == 'all',
unique_all_apples).otherwise(F.col('apples_set')))
The Error:
Traceback (most recent call last):
File "/myproject/datasets/apples_matching.py", line 24, in compute
apples_patterns = apples_patterns.withColumn('apples_set', F.when(F.col('apples_logic_string') == 'all',
File "/scratch/asset-install/1c9821b4f6adc95ac4d5f15ff109001b/miniconda38/lib/python3.8/site-packages/pyspark/sql/functions.py", line 1518, in when
jc = sc._jvm.functions.when(condition._jc, v)
File "/scratch/asset-install/1c9821b4f6adc95ac4d5f15ff109001b/miniconda38/lib/python3.8/site-packages/py4j/java_gateway.py", line 1321, in __call__
return_value = get_return_value(
File "/scratch/asset-install/1c9821b4f6adc95ac4d5f15ff109001b/miniconda38/lib/python3.8/site-packages/pyspark/sql/utils.py", line 111, in deco
return f(*a, **kw)
File "/scratch/asset-install/1c9821b4f6adc95ac4d5f15ff109001b/miniconda38/lib/python3.8/site-packages/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.sql.functions.when.
: java.lang.RuntimeException: Unsupported literal type class java.util.ArrayList [43,54,95,84,67]