How to construct schema to use with UDF if the function is returning an array of dictionaries

Question

I am trying to create schema for below mentioned type of data, it's a list of dictionaries for using it with udf but I am getting the error mentioned in below.

 Unexpected tuple %r with StructType

 [{'cumulativeDefaultbalance': 0, 'loanId': 13131, 'cumulativeEndingBalance': 4877.9918745262694, 'cumulativeContractpaymentw': 263.67479214039736, 'month': 1, 'cumulativeInterestpayment': 141.66666666666666, 'cumulativePrincipalpayment': 122.00812547373067, 'cumulativeAdjbeginingbal': 5000, 'cumulativePrepaymentamt': 40.315417142065087}]

Below is the schema object that I am building

schema = StructType([
            StructField('cumulativeAdjbeginingbal', FloatType(), False),
            StructField('cumulativeEndingBalance', FloatType(), False),
            StructField('cumulativeContractpaymentw', FloatType(), False),
            StructField('cumulativeInterestpayment', FloatType(), False),
            StructField('cumulativePrincipalpayment', FloatType(), False),
            StructField('cumulativePrepaymentamt', FloatType(), False),
            StructField('cumulativeDefaultbalance', FloatType(), False)
        ])

Can anyone tell what's making my code fail?

Can you post an example line from your csv + the code you use to read it?? — Laurens Koppenol
– Laurens Koppenol, Commented Sep 27, 2017 at 14:57
Here is a gist with all you need gist.github.com/smitthakkar96/26345d52f75ff4777e837606f7bec7d5 — Smit
– Smit, Commented Sep 27, 2017 at 15:20

ags29 · Accepted Answer · 2017-09-27 15:48:03Z

1

The issue, as far as I can see, is that the schema you are defining requires that the rdd elements be in the form of lists rather than dictionaries. So you can do this before creating the DF (assuming your base list of dicts rdd is called df

df.map(lambda x: x.values)

Alternatively you could the following and eliminate explicit schema definition:

from pyspark.sql import Row
df.map(lambda x: Row(**x)).toDF()

EDIT: Actually looks like the schema is for return type of a UDF. I think the following should work:

from pyspark.sql.types import ArrayType

schema = ArrayType(StructType([
        StructField('cumulativeAdjbeginingbal', FloatType(), False),
        StructField('cumulativeEndingBalance', FloatType(), False),
        StructField('cumulativeContractpaymentw', FloatType(), False),
        StructField('cumulativeInterestpayment', FloatType(), False),
        StructField('cumulativePrincipalpayment', FloatType(), False),
        StructField('cumulativePrepaymentamt', FloatType(), False),
        StructField('cumulativeDefaultbalance', FloatType(), False)
    ]), False)

edited Sep 27, 2017 at 15:48

answered Sep 27, 2017 at 14:38

ags29

2,7061 gold badge11 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Smit Over a year ago

Thanks for quick answer but there are few things: 1) I am defining my df by spark.read.csv 2) If I was not doing so then also I have to calculate 10 columns, fix datatypes etc before I get to generating this kind of fields. 3) In this case I have to do df.rdd.flatMap and after that I have to convert it back to df and then perform join, do you think it is adivsable?

ags29 Over a year ago

Ah ok, is this actually json data or something?

Smit Over a year ago

What I am saying is I have a CSV, I load it with spark.read.csv then I calculate JPScore with df.withColumns and then I run df.withColumns for the function that returns above value. My dataframe fails to accept it with error unexpected tuple %r with structtype

ags29 Over a year ago

maybe post some of code for those intermediate steps?

Smit Over a year ago

Let us continue this discussion in chat.

Collectives™ on Stack Overflow

How to construct schema to use with UDF if the function is returning an array of dictionaries

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related