Erro 'DataFrame' object has no attribute '_get_object_id'

Question

When I run only select it works, it returns data, but when I put it to save in the lake, this message appears

Error 'DataFrame object' has no attribute '_get_object_id'

try:

    dfNovo = spark.read.format('parquet').load(dfNovo)
    histCZ = spark.read.format("parquet").load(histCZ)


    dfNovo = dfNovo.fillna('')
    histCZ = histCZ.fillna('')
    dfNovo.createOrReplaceTempView('hist_hz')
    histCZ.createOrReplaceTempView('hist_cz')

    spark.catalog.refreshTable("hist_hz")
    spark.catalog.refreshTable("hist_cz")

    c = spark.sql("""select distinct a.* from hist_hz a 
                    left join (select * from hist_cz) b
                    on  
                    a.fornecimento = b.fornecimento 
                    and a.centro = b.centro 
                    and a.atribuicao = b.atribuicao 
                    and a.ped_pca = b.ped_pca 
                    and a.transporte = b.transporte 
                    and a.codigo_material = b.codigo_material 
                    and a.descr_produto = b.descr_produto 
                    and a.descr_status_pedido = b.descr_status_pedido 
                    and a.hora_puxada = b.hora_puxada 
                    and a.cliente = b.cliente 
                    and a.cliente_sap = b.cliente_sap 
                    and a.numero_nota_fiscal = b.numero_nota_fiscal 
                    and a.data_inicio_carregamento = b.data_inicio_carregamento 
                    and a.hora_inicio_carregamento = b.hora_inicio_carregamento 
                    and a.dt_termino_carregamento = b.dt_termino_carregamento 
                    and a.hora_termino_carregamento = b.hora_termino_carregamento 
                    and a.numeroov_pedtransf = b.numeroov_pedtransf 
                    and a.can_distrib = b.can_distrib 
                    and a.tipo_operacao = b.tipo_operacao
                    and a.flagAtivo = b.flagAtivo

                    where a.createdDate = '02-01-2020'
                    and b.cliente_sap is null
                       """)
    print(c.count())

    if (c.count() >0 ):

        c.write.mode('overwrite').format('parquet').option("encoding", 'UTF-8').partitionBy('data_puxada').save(histCZ)

        print("Finalizado")


    #print(PickingAutomatico.count())
except Exception as e:
  print('Erro ',e)`` `

can you please add the error code as well as the relevant data to load? — Michail N
– Michail N, Commented Jun 10, 2020 at 13:15
Just a heads up. Big SQL in String is almost non-debuggable. I recommend you rewrite it into a more "object" way. — Saša Zejnilović
– Saša Zejnilović, Commented Jun 10, 2020 at 13:37
Hi, so, the code is exactly what's up there, the strange thing that it shows the data perfectly, but when it will save it gives this error — Lorena Jesus
– Lorena Jesus, Commented Jun 10, 2020 at 13:37

Saša Zejnilović · Accepted Answer · 2020-06-10 13:41:00Z

3

You are overwriting your own variables.

histCZ = spark.read.format("parquet").load(histCZ)

and then using the histCZ variable as a location where to save the parquet. But at this time it is a dataframe

c.write.mode('overwrite').format('parquet').option("encoding", 'UTF-8').partitionBy('data_puxada').save(histCZ)

At this point histCZ is not the location

answered Jun 10, 2020 at 13:41

Saša Zejnilović

9207 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Lorena Jesus Over a year ago

I created a variable just for the file path and it worked. Thanks

Collectives™ on Stack Overflow

Erro 'DataFrame' object has no attribute '_get_object_id'

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related