Extracting Rows of Data from a CSV-like File Using Python

Question

I have a large file from a proprietary archive format. Unzipping this archive gives a file that has no extension, but the data inside is comma-delimited. Adding a .csv extension or simply opening the file with Excel will work.

I have about 375-400 of these files, and I'm trying to extract a chunk of rows (about 13,500 out of 1.2M+ rows) between a keyword "Point A" and another keyword "Point B".

I found some code on this site that I think is extracting the data correctly, but I'm getting an error:

AttributeError: 'list' object has no attribute 'rows' when trying to save out the file. Can somebody help me get this data to save into a csv?

import re
import csv
import time

print(time.ctime())

file = open('C:/Users/User/Desktop/File with No Extension That\'s Very Similar to CSV', 'r')
data = file.read()
x = re.findall(r'Point A(.*?)Point B', data,re.DOTALL)

name = "C:/Users/User/Desktop/testoutput.csv"
with open(name, 'w', newline='') as file2:
    savefile = csv.writer(file2)
    for i in x.rows:
        savefile.writerow([cell.value for cell in i])

print(time.ctime())

Thanks in advance, any help would be much appreciated.

re.findall() returns a list. The error tells you that list objects have no rows attribute. You should look at what x contains and see what you can do from there — selimb
– selimb, Commented Jul 7, 2015 at 19:20
Can you please edit your question and fix the quotation marks? I'm sorry, I'm nitpicking, but man having the coloring wrong is confusing. — NightShadeQueen
– NightShadeQueen, Commented Jul 7, 2015 at 19:21

Martin Evans · Accepted Answer · 2015-07-07 20:16:43Z

3

The following should work nicely. As mentioned, your regex usage was almost correct. It is possible to still use the Python CSV library to do the CSV processing by converting the found text into a StringIO object and passing that to the CSV reader:

import re
import csv
import time
import StringIO

print(time.ctime())

input_name = "C:/Users/User/Desktop/File with No Extension That's Very Similar to CSV"
output_name = "C:/Users/User/Desktop/testoutput.csv"

with open(input_name, 'r') as f_input, open(output_name, 'wb') as f_output:
    # Read whole file in
    all_input = f_input.read()  

    # Extract interesting lines
    ab_input = re.findall(r'Point A(.*?)Point B', all_input, re.DOTALL)[0]

    # Convert into a file object and parse using the CSV reader
    fab_input = StringIO.StringIO(ab_input)
    csv_input = csv.reader(fab_input)
    csv_output = csv.writer(f_output)

    # Iterate a row at a time from the input
    for input_row in csv_input:
        # Skip any empty rows
        if input_row:
            # Write row at a time to the output
            csv_output.writerow(input_row)

print(time.ctime())

You have not given us an example from your CSV file, so if there are problems, you might need to configure the CSV 'dialect' to process it better.

Tested using Python 2.7

answered Jul 7, 2015 at 20:16

Martin Evans

46.9k17 gold badges88 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

David Over a year ago

Hi Martin, I had a quick follow-up if you don't mind me asking. If I were to modify the re.findall command to go from "Point A" to the end of the file, how would I go about doing that? I've been using this quite a bit, but the file formats have changed and there no longer is a "Point B" for the end position. Thanks!

Martin Evans Over a year ago

You could try changing it to r'Point A(.*)'

David Over a year ago

Thanks again! Tried (.*?) but not (.*).

Fabio Menegazzo · Accepted Answer · 2015-07-07 19:38:59Z

You have 2 problems here: the first is related to the regular expression and the other is about the list syntax.

Getting what you want

The way you are using the regular expression will return to you a list with a single value (all lines into an unique string).

Probably there is a better way of doing this but I would go now with something like this:
```
with open('bla', 'r') as input:
    data = input.read()
    x = re.findall(r'Point A(.*?)Point B', data, re.DOTALL)[0]
    x = x.splitlines(False)[1:]
```
That's not pretty but will return a list with all values between those two points.
Working with lists

There is no rows attribute inside lists. You just have to iterate over it:
```
for i in x:
    do what you have to do
```
See, I'm not familiar to the csv library but it looks that you will have to perform some manipulations to the i value before adding it to the library.

IMHO, I would avoid using CSV format since it is kind of "locale dependent" so it may not work as expected depending the settings your end-users may have on OS.

Ashwaq · Accepted Answer · 2018-04-17 09:35:31Z

Updating the code so that @Martin Evans answer works on the latest Python version.

import re
import csv
import time
import io

print(time.ctime())

input_name = "C:/Users/User/Desktop/File with No Extension That's Very Similar to CSV"
output_name = "C:/Users/User/Desktop/testoutput.csv"

with open(input_name, 'r') as f_input, open(output_name, 'wt') as f_output:
    # Read whole file in
    all_input = f_input.read()  

    # Extract interesting lines
    ab_input = re.findall(r'Point A(.*?)Point B', all_input, re.DOTALL)[0]

    # Convert into a file object and parse using the CSV reader
    fab_input = io.StringIO(ab_input)
    csv_input = csv.reader(fab_input)
    csv_output = csv.writer(f_output)

    # Iterate a row at a time from the input
    for input_row in csv_input:
        # Skip any empty rows
        if input_row:
            # Write row at a time to the output
            csv_output.writerow(input_row)

print(time.ctime())

Also, by using 'wt' instead of 'wb' one can avoid

"TypeError: a bytes-like object is required, not 'str'"

Collectives™ on Stack Overflow

Extracting Rows of Data from a CSV-like File Using Python

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related