How do I get python to write a csv file from the output my code?

Question

I am incredibly new to python, so I might not have the right terminology...

I've extracted text from a pdf using pdfplumber. That's been saved as a object. The code I used for that is:

with pdfplumber.open('Bell_2014.pdf') as pdf:
    page = pdf.pages[0]
    bell = page.extract_text()
    print(bell)

So "bell" is all of the text from the first page of the imported PDF. what bell looks like I need to write all of that text as a string to a csv. I tried using:

 with open('Bell_2014_ex.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(bell)

and

bell_ex = 'bell_2014_ex.csv'

with open(bell_ex, 'w', newline='') as csvfile:
   file_writer = csv.writer(csvfile,delimiter=',')
   file_writer.writerow(bell)

All I keep finding when I search this is how to create a csv with specific characters or numbers, but nothing from an output of an already executed code. For instance, I can get the above code:

bell_ex = 'bell_2014_ex.csv'

with open(bell_ex, 'w', newline='') as csvfile:
   file_writer = csv.writer(csvfile,delimiter=',')
   file_writer.writerow(['bell'])

to create a csv that has "bell" in one cell of the csv, but that's as close as I can get. I feel like this should be super easy, but I just can't seem to get it to work. Any thoughts? Please and thank you for helping my inexperienced self.

We don't know what bell looks like. Can you post what print(bell) output? Or, since its likely longer than we need, a trimmed up version? — tdelaney
– tdelaney, Commented Jun 19, 2020 at 23:34
That looks like a single multiline string. Not a "dataframe" (you have to clarify what this is, the popular pandas.DataFrame or something else). CSV is for columnar data and I'm not seeing anything columnar. — tdelaney
– tdelaney, Commented Jun 20, 2020 at 0:03
Thank you for that clarification, I corrected my post to say object instead of dataframe. — DMM
– DMM, Commented Jun 20, 2020 at 0:15
@DMM. Off-topic, but you should actually accept the working answer to your question. It's simple courtesy and just how this site works — JvdV
– JvdV, Commented Jul 15, 2020 at 9:47

Chase · Accepted Answer · 2020-06-19 23:45:59Z

1

page.extract_text() is defined as: "Collates all of the page's character objects into a single string." which would make bell just a very long string.

The CSV writerow() expects by default a list of strings, with each item in the list corresponding to a single column.

Your main issue is a type mismatch, you're trying to write a single string where a list of strings is expected. You will need to further operate on your bell object to convert it into a format acceptable to be written to a CSV.

Without having any knowledge of what bell contains or what you intend to write, I can't get any more specific, but documentation on Python's CSV module is very comprehensive in terms of settings delimiters, dialects, column definitions, etc. Once you have converted bell into a proper iterable of lists of strings, you can then write it to a CSV.

answered Jun 19, 2020 at 23:45

Chase

3,1053 gold badges20 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

DMM Over a year ago

I went back and added a screencap of what "bell" looks like. It's very long since it's all the text of the first page, so I cropped it.

Chase Over a year ago

And that screenshot just reinforces that bell is a giant string. What is also missing is what you expect. Is your CSV intended to just have one single row with one single column that contains the value of bell? If that's the case then file_writer.writerow([bell]) with the quotes removed and you're done. If your intended final CSV structure is more complex then you will need to define that structure and manipulate bell into a corresponding Python iterable before writing to CSV. One list per row, one list item per column.

DMM Over a year ago

Okay, that makes a lot more sense. I'll have to get clarification on what I need to do with the csv's. (I'm learning this technique for my thesis, but without any actual training, so this is a "learn as ya go" situation for me.) Thank you for helping me, truly.

Chase Over a year ago

FWIW you're probably better off understanding this by experimenting with the operation in reverse. Take any spreadsheet and export it as csv. Use Python's CSV reader to read the file into the default output (a list of lists) and inspect it, both at the row level and overall. When you see what Python outputs each row as in relation to the original csv, you'll be able to model what the writer expects when you want to write your data.

FisheyJay · Accepted Answer · 2020-06-19 23:46:35Z

0

Some similar code I wrote recently converts a tab-separated file to csv for insertion into sqlite3 database:

Maybe this is helpful:

    retval = ''
    mode = 'r'
    out_file = os.path.join('input', 'listfile.csv')

    """
    Convert tab-delimited listfile.txt to comma separated values (.csv) file
    """

    in_text = open(listfile.txt, 'r')
    in_reader = csv.reader(in_text, delimiter='\t')
    out_csv = open(out_file, 'w', newline='\n')
    out_writer = csv.writer(out_csv, dialect=csv.excel)

    for _line in in_reader:
        out_writer.writerow(_line)
    out_csv.close()

... and that's it, not too tough

answered Jun 19, 2020 at 23:46

FisheyJay

4504 silver badges10 bronze badges

1 Comment

tdelaney Over a year ago

But OP isn't reading from a CSV so this likely doesn't apply. As an aside, you could out_writer.writerows(in_reader) and avoid the `for.

DMM · Accepted Answer · 2020-07-15 10:05:30Z

So my problem was that I was missing the "encoding = 'utf-8'" for special characters and my delimiter need to be a space instead of a comma. What ended up working was:

from pdfminer.high_level import extract_text
object = extract_text('filepath.pdf')
print(object)

new_csv = 'filename.csv'

with open(new_csv, 'w', newline='', encoding = 'utf-8') as csvfile:
    file_writer = csv.writer(csvfile,delimiter=' ')
    file_writer.writerow(object)

However, since a lot of my pdfs weren't true pdfs but scans, the csv ended up having a lot of weird symbols. This worked for about half of the pdfs I have. If you have true pdfs, this will be great. If not, I'm currently trying to figure out how to extract all the text into a pandas dataframe separated by headers within the pdfs since pdfminer extracted all text perfectly. Thank you for everyone that helped!

Collectives™ on Stack Overflow

How do I get python to write a csv file from the output my code?

3 Answers 3

4 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related