4

please I would like to transform the tags column in the match_event dataframe below

+-------+------------+------------------+--------+-------+-----------+--------+--------------------+----------+--------------------+--------------------+------+
|eventId|   eventName|          eventSec|      id|matchId|matchPeriod|playerId|           positions|subEventId|        subEventName|                tags|teamId|
+-------+------------+------------------+--------+-------+-----------+--------+--------------------+----------+--------------------+--------------------+------+
|      8|        Pass| 1.255989999999997|88178642|1694390|         1H|   26010|[[50, 48], [47, 50]]|        85|         Simple pass|            [[1801]]|  4418|
|      8|        Pass|2.3519079999999803|88178643|1694390|         1H|    3682|[[47, 50], [41, 48]]|        85|         Simple pass|            [[1801]]|  4418|
|      8|        Pass|3.2410280000000284|88178644|1694390|         1H|   31528|[[41, 48], [32, 35]]|        85|         Simple pass|            [[1801]]|  4418|
|      8|        Pass| 6.033681000000001|88178645|1694390|         1H|    7855| [[32, 35], [89, 6]]|        83|           High pass|            [[1802]]|  4418|
|      1|        Duel|13.143591000000015|88178646|1694390|         1H|   25437|  [[89, 6], [85, 0]]|        12|Ground defending ...|     [[702], [1801]]|  4418|
|      1|        Duel|14.138041000000044|88178663|1694390|         1H|   83575|[[11, 94], [15, 1...|        11|Ground attacking ...|     [[702], [1801]]| 11944|
|      3|   Free Kick|27.053005999999982|88178648|1694390|         1H|    7915| [[85, 0], [93, 16]]|        36|            Throw in|            [[1802]]|  4418|
|      8|        Pass| 28.97515999999996|88178667|1694390|         1H|   70090|  [[7, 84], [9, 71]]|        82|           Head pass|    [[1401], [1802]]| 11944|
|     10|        Shot| 31.22621700000002|88178649|1694390|         1H|   25437|  [[91, 29], [0, 0]]|       100|                Shot|[[402], [1401], [...|  4418|
|      9|Save attempt| 32.66416000000004|88178674|1694390|         1H|   83574|[[100, 100], [15,...|        91|        Save attempt|    [[1203], [1801]]| 11944|
+-------+------------+------------------+--------+-------+-----------+--------+--------------------+----------+--------------------+--------------------+------+

to something like this, that is extracting the last item in the list to a column as seen below

+----+
|tags|
+----+
|1801|
|1801|
|1801|
|1802|
|1801|
|1801|
+----+

the column would be re-attached to the match_event dataframe, maybe using withColumn

I tried the below code


u = match_event[['tags']].rdd
t=u.map(lambda xs: [n for x in xs[-1:] for n in x[-1:]])
tag = spark.createDataFrame(t, ['tag'])

I got this. Was difficult to further implement using withColumn

+------+
|   tag|
+------+
|[1801]|
|[1801]|
|[1801]|
|[1802]|
|[1801]|
|[1801]|
|[1802]|
|[1802]|
|[1801]|
|[1801]|
|[1801]|
|[1801]|
|[1302]|
|[1802]|
|[1801]|
|[1802]|
|[1801]|
|[1801]|
|[1801]|
|[1801]|
+------+

Please help. Thanks in advance

2 Answers 2

2

For spark2.4+ use element_at.

df.withColumn("lastItem", F.element_at("tags",-1)[0]).show()

#+---------------+--------+
#|           tags|lastItem|
#+---------------+--------+
#|[[1], [2], [3]]|       3|
#|[[1], [2], [3]]|       3|
#+---------------+--------+
Sign up to request clarification or add additional context in comments.

Comments

-1

Try this :

from pyspark.sql.functions import udf

columns = ['eventId',   'eventName','eventSec', 'id','matchId','matchPeriod','playerId', 'positions','subEventId','subEventName', tags','teamId']
vals = [ (   8, "Pass", 1.255989999999997,88178642,1694390,"1H",   26010,[[50, 48], [47, 50]],85,"Simple pass",[[1801]],  4418),
         (   1,"Duel",13.143591000000015,88178646,1694390,"1H",25437,  [[89, 6], [85, 0]],12,"Ground defending",[[702], [1801]],  4418)
       ]

udf1 =spark.udf.register("Lastcol", lambda xs: [n for x in xs[-1:] for n in x[-1:]])


df = spark.createDataFrame(vals, columns)
df2 = df.withColumn( 'created_col',udf1('tags')).show()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.