I have the below pyspark dataframe
a = ['480s','480s','499s','499s','650s','650s','702s','702s','736s','736s','736s','737s','737s']
b = ['North','West','East','North','East','North','North','West','North','South','West','North','West']
df = pd.DataFrame(dict(dcode=a, zone=b))
dcode zone
0 480s North
1 480s West
2 499s East
3 499s North
4 650s East
5 650s North
6 702s North
7 702s West
8 736s North
9 736s South
10 736s West
11 737s North
12 737s West
I want my data frame looks like -
dcode zone output
0 480s North NW
1 480s West NW
2 499s East
3 499s North NW
4 650s East
5 650s North NW
6 702s North
7 702s West
8 736s North
9 736s South
10 736s West
11 737s North
12 737s West
For the same I am using this logic, but it's not giving the desired results.
df_ = df.withColumn("output", F.when((F.col("Zone") == "North") | (F.col("Zone") == "West") & (F.col("dcode") != "702s") | (F.col("dcode") != "736s") | (F.col("dcode") != "737s"), "NW"))
I want NW in output column only when zone is north or west and decode is not in 736,737s,702s.