1

I have formatted all my university notes as follows:

CourseName: {
    Part 1: {
        I.I - Intro: {
            Topic1: {
                descr1;
                descr2: {
                    2.a;
                    2.b;
                    2.c.
                };
                descr3.
            };
            Topic2: {
                descr: {
                    example.
                }.
            }.
        };
        I.II - NextChapter: {
            Topic3: {
                whatever.
            }.
        }.
    };
    Part 2: {
        II.I - FinalChapter: {
            content.
        }.
    }.
}

I'd like to structure them into a Tree data structure and I've tried doing so, both recursively and iteratively, in the past hours, doing many researches online, but none of my attempts at it is working.

I've already implemented a Node class (with self.__value and a list self.__children and all the useful methods you would expect from it) as well as a Tree class (with self.__nodes as a dictionary and other utility methods), so feel free to use methods such as add_node or add_child in any form of your liking in your answers.

What I'm struggling with is to understand how to structure the function def parseTree(s, l) - that ideally takes as inputs a string s (my notes) and a list l establishing the delimiters i.e. [":{", ";", "}."] or ["{","}"] or similar - and returns a tree object, with each node having as value the text preceding :{ and a list of children (if any) separated by ; in the text.

Any suggestion?

7
  • 2
    Show your code! Commented Mar 25, 2018 at 17:44
  • 1
    You should reformat them into a parseable form (JSON, yaml, etc) first. Commented Mar 25, 2018 at 17:49
  • chepner and Daniel Roseman are right—it's always easier to use an existing parser. But if you do want to do this yourself, I think what you may be missing is that you're 75% of the way to designing a custom recursive descent parser but haven't hit on the idea of implementing it recursively yet. That may help you search. But, on the other hand, your language is more that simple enough that you could write a simple grammar to feed to a parser generator or parser framework instead of writing code, so you may want to search on that instead. Commented Mar 25, 2018 at 17:57
  • @ekhumoro I know I had to show it but I kinda couldn't because it was a not-running mess, result of trial and error and more of a general idea on how to approach the problem rather than actual code Commented Mar 26, 2018 at 14:44
  • @abarnert Thank you so much, this is exactly what I was trying to do and looking for. A whole new world just opened up to me! I'm also willing to reformat my "grammar" to make it as simple and parsable as possible for this to work. But 75% of the way it's a long shot... I'm still quite clueless on how to make progress. Commented Mar 26, 2018 at 14:49

2 Answers 2

4

This is actually almost syntactically valid YAML. A simple substitution will make it valid:

data = data.replace(';', ',').replace('.', '')
parsed = yaml.load(data)
Sign up to request clarification or add additional context in comments.

1 Comment

You probably want some more specific replacements, at the very least just ; and . that end a line.
2

Assuming your data is stored in a file, you can build a simple class to parse the structure into a dictionary. You can recursively traverse the data by creating a new Notes object for each key found:

file_data = filter(None, [i.strip('\n') for i in open('filename.txt')])
import re
class Notes:
   def __init__(self, token_data):
     self.token_data = token_data
     self.current_dict = {}
     self.current_vals = []
     self.parse()
   def parse(self):
     while True:
       start = next(self.token_data, None)
       if not start or "}" in start:
         break
       if start.endswith('{'):
          note = Notes(self.token_data)
          final_result = filter(lambda x:x, note.current_vals + [note.current_dict]) if note.current_vals else note.current_dict
          self.current_dict[re.findall('[\w\s\-\.]+', re.sub('^\s+', '', start))[0]] = final_result[0] if isinstance(final_result, list) and len(final_result) == 1 else final_result
          self.token_data = note.token_data
       else:
          self.current_vals.append(re.sub('^\s+', '', start))


course_notes = Notes(iter(file_data)).current_dict

Output:

{'CourseName': 
    {'Part 1': 
      {'I.I - Intro': 
         {'Topic1': ['descr1;',
                    'descr3.',
             {'descr2': ['2.a;',
                         '2.b;',
                          '2.c.']
                }
               ],
         'Topic2': {'descr': 'example.'}
                 },
                  'I.II - NextChapter': 
               {'Topic3': 'whatever.'}
             },
       'Part 2':{'II.I - FinalChapter': 'content.'}
     } 
   }

2 Comments

This is cool and effective and yes, data is stored in a text file indeed. I just wish I could understand half of the lines of code you've written, haha
@FlynnRhodes Glad to help! I added more of an explanation in the post, however, all the code does is create a new class for every occurrence of a data key. The file data is store as an iterator, that way, it can be more easily determined when the data source is exhausted.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.