1

When I run only select it works, it returns data, but when I put it to save in the lake, this message appears

Error 'DataFrame object' has no attribute '_get_object_id'

try:

    dfNovo = spark.read.format('parquet').load(dfNovo)
    histCZ = spark.read.format("parquet").load(histCZ)


    dfNovo = dfNovo.fillna('')
    histCZ = histCZ.fillna('')
    dfNovo.createOrReplaceTempView('hist_hz')
    histCZ.createOrReplaceTempView('hist_cz')

    spark.catalog.refreshTable("hist_hz")
    spark.catalog.refreshTable("hist_cz")

    c = spark.sql("""select distinct a.* from hist_hz a 
                    left join (select * from hist_cz) b
                    on  
                    a.fornecimento = b.fornecimento 
                    and a.centro = b.centro 
                    and a.atribuicao = b.atribuicao 
                    and a.ped_pca = b.ped_pca 
                    and a.transporte = b.transporte 
                    and a.codigo_material = b.codigo_material 
                    and a.descr_produto = b.descr_produto 
                    and a.descr_status_pedido = b.descr_status_pedido 
                    and a.hora_puxada = b.hora_puxada 
                    and a.cliente = b.cliente 
                    and a.cliente_sap = b.cliente_sap 
                    and a.numero_nota_fiscal = b.numero_nota_fiscal 
                    and a.data_inicio_carregamento = b.data_inicio_carregamento 
                    and a.hora_inicio_carregamento = b.hora_inicio_carregamento 
                    and a.dt_termino_carregamento = b.dt_termino_carregamento 
                    and a.hora_termino_carregamento = b.hora_termino_carregamento 
                    and a.numeroov_pedtransf = b.numeroov_pedtransf 
                    and a.can_distrib = b.can_distrib 
                    and a.tipo_operacao = b.tipo_operacao
                    and a.flagAtivo = b.flagAtivo

                    where a.createdDate = '02-01-2020'
                    and b.cliente_sap is null
                       """)
    print(c.count())

    if (c.count() >0 ):

        c.write.mode('overwrite').format('parquet').option("encoding", 'UTF-8').partitionBy('data_puxada').save(histCZ)

        print("Finalizado")


    #print(PickingAutomatico.count())
except Exception as e:
  print('Erro ',e)`` `
3
  • can you please add the error code as well as the relevant data to load? Commented Jun 10, 2020 at 13:15
  • Just a heads up. Big SQL in String is almost non-debuggable. I recommend you rewrite it into a more "object" way. Commented Jun 10, 2020 at 13:37
  • Hi, so, the code is exactly what's up there, the strange thing that it shows the data perfectly, but when it will save it gives this error Commented Jun 10, 2020 at 13:37

1 Answer 1

3

You are overwriting your own variables.

histCZ = spark.read.format("parquet").load(histCZ)

and then using the histCZ variable as a location where to save the parquet. But at this time it is a dataframe

c.write.mode('overwrite').format('parquet').option("encoding", 'UTF-8').partitionBy('data_puxada').save(histCZ)

At this point histCZ is not the location

Sign up to request clarification or add additional context in comments.

1 Comment

I created a variable just for the file path and it worked. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.