0

I have a pyspark dataframe

enter image description here

I want to check each row for the address column and if it contains the substring "india" then I need to add another column and say true else false

and also i wanted to check the substring is present in the column value string if yes print yes else no.. this has to iterate for all the rows in dataframe. like:

if "india" or "karnataka" is in sparkDF["address"]:
 print("yes")
else:
 print("no")

I'm getting the wrong results as it's checking for each character instead of the substring. How to achieve this?

enter image description here

How to achieve this?

I wasn't able to achieve this

1 Answer 1

1

You can utilise contains or like for this

Data Preparation

s = StringIO("""
user,address
rishi,XYZ Bangalore Karnataka
kirthi,ABC Pune India
tushar,ASD Orissa India
"""
)

df = pd.read_csv(s,delimiter=',')

sparkDF = sql.createDataFrame(df)

sparkDF.show()

+------+-----------------------+
|user  |address                |
+------+-----------------------+
|rishi |XYZ Bangalore Karnataka|
|kirthi|ABC Pune India         |
|tushar|ASD Orissa India       |
+------+-----------------------+

Contains

sparkDF = sparkDF.withColumn('result',F.lower(F.col('address')).contains("india"))

sparkDF.show(truncate=False)

+------+-----------------------+------+
|user  |address                |result|
+------+-----------------------+------+
|rishi |XYZ Bangalore Karnataka|false |
|kirthi|ABC Pune India         |true  |
|tushar|ASD Orissa India       |true  |
+------+-----------------------+------+

Like - Multiple Search Patterns

sparkDF = sparkDF.withColumn('result',F.lower(F.col('address')).like("%india%") 
                             | F.lower(F.col('address')).like("%karnataka%") 
                            )

sparkDF.show(truncate=False)

+------+-----------------------+------+
|user  |address                |result|
+------+-----------------------+------+
|rishi |XYZ Bangalore Karnataka|true  |
|kirthi|ABC Pune India         |true  |
|tushar|ASD Orissa India       |true  |
+------+-----------------------+------+
Sign up to request clarification or add additional context in comments.

11 Comments

and also how to chreck if column address has "india" or "karnataka" and peforme certain operations if present for each row in sparkDF dataframe?
Vaebhav, can u help me with this too: and also how to chreck if column address has "india" or "karnataka" and peforme certain operations if present for each row in sparkDF dataframe?
Updated the answer to demonstrate multiple patterns matches as well
i mean i wanted something like this like: if "india" or "karnataka" in sparkDF['address']: print("yes") else: print("no") need to iterate this for all rows. how to achieve this?
when i tried it , its giving wront results
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.