1

I'm using the below code to read data from an api where the payload is in json format using pyspark in azure databricks. All the fields are defined as string but keep running into json_tuple requires that all arguments are strings error.

Schema:

root
 |-- Payload: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- ActiveDate: string (nullable = true)
 |    |    |-- BusinessId: string (nullable = true)
 |    |    |-- BusinessName: string (nullable = true)

JSON:

 {
    "Payload": 
    [
        {
            "ActiveDate": "2008-11-25",
            "BusinessId": "5678",
            "BusinessName": "ACL"
        },
        {
            "ActiveDate": "2009-03-22",
            "BusinessId": "6789",
            "BusinessName": "BCL"
        }
    ]
}

PySpark:

from pyspark.sql import functions as F
df = df.select(F.col('Payload'), F.json_tuple(F.col('Payload'), 'ActiveDate', 'BusinessId', 'BusinessName') \.alias('ActiveDate', 'BusinessId', 'BusinessName'))
df.write.format("delta").mode("overwrite").saveAsTable("delta_payload")

Error:

AnalysisException: cannot resolve 'json_tuple(`Payload`, 'ActiveDate', 'BusinessId', 'BusinessName')' due to data type mismatch: json_tuple requires that all arguments are strings;

1 Answer 1

1

From your schema it looks like the JSON is already parsed, so Payload is of ArrayType rather than StringType containing JSON, hence the error.

You probably need explode instead of json_tuple:

>>> from pyspark.sql.functions import explode
>>> df = spark.createDataFrame([{
...     "Payload":
...     [
...         {
...             "ActiveDate": "2008-11-25",
...             "BusinessId": "5678",
...             "BusinessName": "ACL"
...         },
...         {
...             "ActiveDate": "2009-03-22",
...             "BusinessId": "6789",
...             "BusinessName": "BCL"
...         }
...     ]
... }])
>>> df.schema
StructType(List(StructField(Payload,ArrayType(MapType(StringType,StringType,true),true),true)))
>>> df.select(explode("Payload").alias("x")).select("x.ActiveDate", "x.BusinessName", "x.BusinessId").show()
+----------+------------+----------+
|ActiveDate|BusinessName|BusinessId|
+----------+------------+----------+
|2008-11-25|         ACL|      5678|
|2009-03-22|         BCL|      6789|
+----------+------------+----------+
Sign up to request clarification or add additional context in comments.

5 Comments

Hi @Czaporka, I run into the error NameError: name 'StructType' is not defined while trying to use this line of code. StructType(List(StructField(Report_Entry,ArrayType(MapType(StringType,StringType,true),true),true)))
Hi @paone, that line is just the output I got after typing df.schema in an interactive interpreter session. I included it just to show the schema of my DataFrame. The lines that you should execute are prefixed with >>> . Most importantly, you probably need the first one (import explode) and the last one with the select.
Hi @Czaporka, Thank you. That works. But the issue I now encounter is df.show(truncate=False) shows data in a tabular format +----------+------------+----------+ |ActiveDate|BusinessName|BusinessId| +----------+------------+----------+ |2008-11-25| ACL| 5678| |2009-03-22| BCL| 6789| +----------+------------+----------+ delta_tbl results are array df.write.format("delta").mode("overwrite").saveAsTable("delta_tbl") spark.sql("SELECT * FROM delta_tbl LIMIT 1")
@paone did you assign the result of the select with the explode back to df? In my code sample, I did df.select(...).show() again just to show what that select is going to return; but in your actual code you'd need to assign its result to df before writing it, i.e. df = df.select(...); df.write.format(...)... (like in your original code), or just do df.select(...).write.format(...)....
Hi @Czaporka, Yes I did, works now. Thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.