1

I have a text file like this:

ID = 31
Ne = 5122
============
List of 104 four tuples:
1    2    12    40
2    3    4     21
.
.
51   21   41    42   

ID = 34
Ne = 5122
============
List of 104 four tuples:
3    2    12    40
4    3    4     21
.
.

The four-tuples are tab delimited.

For each ID, I'm trying to make a dictionary with the ID being the key and the four-tuples (in list/tuple form) as elements of that key.

 dict = {31: (1,2,12,40),(2,3,4,21)....., 32:(3,2,12,40), (4,3,4,21)..

My string parsing knowledge is limited to adding using a reference object for file.readlines(), using str.replace() and str.split() on 'ID = '. But there has to be a better way. Here some beginnings of what I have.

file = open('text.txt', 'r')
fp = file.readlines()
B = [];
for x in fp:
    x.replace('\t',',')
    x.replace('\n',')')
    B.append(x)
2
  • You could try to write a grammar using a lib like pyparsing or ply Commented Jul 22, 2015 at 20:37
  • If you're still around, could you mark one of the answers to this question as correct? Commented May 24, 2017 at 16:45

3 Answers 3

2

something like this:

ll = []
for line in fp:
    tt = tuple(int(x) for x in line.split())
    ll.append(tt)

that will produce a list of tuples to assign to the key for your dictionary

Sign up to request clarification or add additional context in comments.

Comments

2

Python's great for this stuff, why not write up a 5-10 liner for it? It's kind of what the language is meant to excel at.

$ cat test
ID = 31
Ne = 5122
============
List of 104 four tuples:
1       2       12      40
2       3       4       21

ID = 34
Ne = 5122
============
List of 104 four tuples:
3       2       12      40
4       3       4       21


data = {}
for block in open('test').read().split('ID = '):
    if not block:
        continue #empty line
    lines = block.split('\n')
    ID = int(lines[0])
    tups = map(lambda y: int(y), [filter(lambda x: x, line.split('\t')) for line in lines[4:]])
    data[ID] = tuple(filter(lambda x: x, tups))
print(data)

# {34: ([3, 2, 12, 40], [4, 3, 4, 21]), 31: ([1, 2, 12, 40], [2, 3, 4, 21])}

Only annoying thing is all the filters - sorry, that's just the result of empty strings and stuff from extra newlines, etc. For a one-off little script, it's no biggie.

2 Comments

hey this worked excellently. i needed the tuples as ints, so i made a quick lambda func: lambda A: [int(x) for x in A] and it looked great. thank you for this!
Cool, forgot about that detail. I added the map in there.
1

I think this will do the trick for you:

import csv

def parse_file(filename):
  """
  Parses an input data file containing tags of the form "ID = ##" (where ## is a
  number) followed by rows of data. Returns a dictionary where the ID numbers
  are the keys and all of the rows of data are stored as a list of tuples
  associated with the key.

  Args:
    filename (string) name of the file you want to parse

  Returns:
    my_dict (dictionary) dictionary of data with ID numbers as keys

  """
  my_dict = {}
  with open(filename, "r") as my_file:  # handles opening and closing file
    rows = my_file.readlines()
    for row in rows:
      if "ID = " in row:
        my_key = int(row.split("ID = ")[1])  # grab the ID number
        my_list = []  # initialize a new data list for a new ID
      elif row != "\n":  # skip rows that only have newline char
        try:  # if this fails, we don't have a valid data line
          my_list.append(tuple([int(x) for x in row.split()]))
        except:
          my_dict[my_key] = my_list  # stores the data list
          continue  # repeat until done with file
  return my_dict

I made it a function so that you can it from anywhere, just passing the filename. It makes assumptions about the file format, but if the file format is always what you showed us here, it should work for you. You would call it on your data.txt file like:

a_dictionary = parse_file("data.txt")

I tested it on the data that you gave us and it seems to work just fine after deleting the "..." rows.

Edit: I noticed one small bug. As written, it will add an empty tuple in place of a new line character ("\n") wherever that appears alone on a line. To fix this, put the try: and except: clauses inside of this:

elif row != "\n":  # skips rows that only contain newline char

I added this to the full code above as well.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.