1

First time poster, long-time lurker. Have searched high and low for an answer to this but it's got to that stage...!

I am having some trouble implementing the answer given by John Machin to this past question:

How to efficiently parse fixed width files?

At a very high level I am using this code to split up fixed format text files and import them into a PostgreSQL database. I have successfully used this code to implement the solution for one text file, however I am now trying to expand my program to work with different text files with different fixed formats, and am continuously running into the same error:

struct.error: unpack_from requires a buffer of at least [x] bytes

Of course, I get a different value for x depending on the format string I am feeding to the function - my problem is that it continues to work for one and only one format, and not any others. The only thing I am changing is the variable used to calculate the format string, and the variable names in the script which relate to the format.

So for example this works fine:

cnv_text = lambda s: str(s.strip())
cnv_int = lambda s: int(s) if s.isspace() is False else s.strip()
cnv_date_ymd = lambda s: datetime.datetime.strptime(s, '%Y%m%d') if s.isspace() is False else s.strip() # YYYY-MM-DD

unpack_len = 0
unpack_fmt = ""
splitData = []

conn = psycopg2.connect("[connection info]")
cur = conn.cursor()

Table1specs = [
    ('A', 6, 14, cnv_text),
    ('B', 20, 255, cnv_text),
    ('C', 275, 1, cnv_text),
    ('D', 276, 1, cnv_text),
    ('E', 277, 1, cnv_text),
    ('F', 278, 1, cnv_text),
    ('G', 279, 1, cnv_text),
    ('H', 280, 1, cnv_text),
    ('I', 281, 8, cnv_date_ymd),
    ('J', 289, 8, cnv_date_ymd),
    ('K', 297, 8, cnv_date_ymd),
    ('L', 305, 8, cnv_date_ymd),
    ('M', 313, 8, cnv_date_ymd),
    ('N', 321, 1, cnv_text),
    ('O', 335, 2, cnv_text),
    ('P', 337, 2, cnv_int),
    ('Q', 339, 5, cnv_int),
    ('R', 344, 255, cnv_text),
    ('S', 599, 1, cnv_int),
    ('T', 600, 1, cnv_int),
    ('U', 601, 5, cnv_int),
    ('V', 606, 10, cnv_text)
    ]

#for each column in the spec variable...
for column in Table1specs:
    start = column[1] - 1
    end = start + column[2]
    if start > unpack_len:
    unpack_fmt += str(start - unpack_len) + "x"
    unpack_fmt += str(end - start) + "s"
unpack_len = end
field_indices = range(len(Table1specs))
print unpack_len, unpack_fmt
#set unpacker
unpacker = struct.Struct(unpack_fmt).unpack_from

class Record(object):
    pass

filename = "Table1Data.txt"

f = open(filename, 'r')
for line in f:
    raw_fields = unpacker(line)
    r = Record()
    for x in field_indices:
        setattr(r, Table1specs[x][0], Table1specs[x][3](raw_fields[x]))
    splitData.append(r.__dict__)

All the data is appended to splitData, which I then cycle through in a loop and work into SQL statements for input into the database via psycopg2. When I change the specs to something else (and the other variables also to reflect this), then I receive the error. It is thrown from the 'raw_fields = unpacker(line)' line.

I have exhausted all resources and am at a loose end... any thoughts or ideas welcomed.

(Could it be to do with the text file I am importing from?)

Best regards.

7
  • Can you give us some minimal-working example code to reproduce this error? Commented Mar 21, 2014 at 16:02
  • @alKid added in code example - similar to the code in the answer from linked question which is why I did not include originally :). Commented Mar 21, 2014 at 16:31
  • @user3446927 - Your example is labeled "This works fine." Please provide an example of code that fails. Commented Mar 26, 2014 at 2:20
  • @Rob, have now solved this issue. Problem was with the text files I was parsing - the lines were not long enough so I have written a function that writes spaces to the end of each line to make them the correct length... seems to be working ok so far. Commented Mar 26, 2014 at 13:05
  • Excellent. Please delete this question so that others don't spend time on it unnecessarily. Commented Mar 26, 2014 at 13:46

1 Answer 1

1

Have since solved this issue: Problem was being caused by the text files I was parsing. The lines were not long enough so I have written a function that writes spaces to the end of each line to make them the correct length:

def checkLineLength(checkFile, minLength):
    print ('Checking length of lines in file '+ checkFile+', where minimum line length is '+str(minLength))
    counter = 0
    fixedFile = 'fixed'+checkFile
    src = open(checkFile, 'r')
    dest = open(fixedFile, 'w')
    lines = src.readlines()
    for line in lines:
        if len(line) < minLength:
            x = (line.rstrip('\r\n') + (" "*(minLength-(len(line)-1))+'\r\n'))
            dest.write(x)
            counter += 1
        else:
            dest.write(line)
    if counter > 0:
        os.remove(checkFile)
        os.rename(fixedFile, checkFile)
        print (str(counter) + " lines fixed in "+ checkFile)

else:
    print('Line length in '+checkFile+' is fine.' )
    os.remove(fixedFile)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.