Pyspark: Read csv file with multiple sheets

Question

The .csv file I am using will have multiple sheets (Dynamic sheet names).

I have to create dataFrames for all the sheets

The syntax I am using:

df = self.spark.read
         .option("sheetName", None)
         .option('header', 'true')
         .csv(file_path)

sheet_names = df.keys()
print(sheet_names)

Error:

'DataFrame' object has no attribute 'keys'

Does this answer your question? Reading Excel (.xlsx) file in pyspark — notNull
– notNull, Commented Apr 4, 2023 at 13:08
@notNull I don't know the sheet names. If I can hardcode then no prob — Adrita Sharma
– Adrita Sharma, Commented Apr 4, 2023 at 13:10
@SarahMesser I need to use apache spark. The answer is in c#. I can solve it in any other languages, c#, python etc. I need to use pyspark — Adrita Sharma
– Adrita Sharma, Commented Apr 4, 2023 at 13:11
@AdritaSharma A CSV has no sheet. It's just a plain text file where the delimtier between columns is supposed to be a comma. — Itération 122442
– Itération 122442, Commented Apr 4, 2023 at 14:14

Itération 122442 · Accepted Answer · 2023-04-04 14:19:42Z

1

You are reading a CSV file, which is a plain text file, so first of all, trying to get excel sheet names from it does not make sense.

Second, reading the CSV file returns you are spark dataframe. This dataframe, as you can see in this documentation, has no method named "keys".

answered Apr 4, 2023 at 14:19

Itération 122442

3,0363 gold badges43 silver badges96 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pyspark: Read csv file with multiple sheets

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related