0

I'm working in a Microsoft Fabric notebook, which uses PySpark. The API response has header 'Content-Type' = 'text/csv'. From what I see, the response.text seems to be similar to '"ColA","ColB","ColC"\r"Row1A","Row1B","Row1C", ...'. In other words, from how it's formatted, I can copy the raw Postman output, paste it in Notepad, save as .csv and it looks fine when opened up in Excel.

The response.text is about 45MB and 16K-17K rows. I would like to put the data in a DataFrame (so I can use the DF to overwrite my Fabric Lakehouse table).

I've attempted df = spark.read.csv(responseText, header=False, schema=mySchema). The data and my schema has 64 fields. This took over 15 minutes before I cancelled the run, since I wasn't sure if I did something wrong. I plan to try it again with header=True and inferSchema=True tomorrow, but firstly, is there preprocessing I have to do to response.text before using spark.read.csv() on it? Is it supposed to take this long? Is there a more efficient way to convert it to a DataFrame? (Example: can I save the response as a .csv file within the notebook, then read.csv() from that?)

(For the Fabric-adept, is it possible to save the text as a file.csv and upload it to the Lakehouse 'Files' section, and turn it into a table there? In a way where I can do this programmatically by running the notebook?)

Thank you!

1 Answer 1

3

spark.read.csv expects a filename, not the contents of the file.

So write the data to a lakehouse file first, like this:

text= """
"ColA","ColB","ColC"
"Row1A","Row1B","Row1C"
"Row2A","Row2B","Row2C"
"""

with open('/lakehouse/default/Files/foo.csv', 'w') as f:
    f.write(text)

df = spark.read.csv("Files/foo.csv",header=True);
display(df)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.