Thanks all very much for your suggestions and help. I tied it all together into the following two Regex Patterns:
This one parses the CSS selector string (e.g. div#myid.myclass[attr=1,fred=3]) http://www.rubular.com/r/2L0N5iWPEJ
cssSelector = re.compile(r'^(?P<type>[\*|\w|\-]+)?(?P<id>#[\w|\-]+)?(?P<classes>\.[\w|\-|\.]+)*(?P<data>\[.+\])*$')
>>> cssSelector.match("table#john.test.test2[hello]").groups()
('table', '#john', '.test.test2', '[hello]')
>>> cssSelector.match("table").groups()
('table', None, None, None)
>>> cssSelector.match("table#john").groups()
('table', '#john', None, None)
>>> cssSelector.match("table.test.test2[hello]").groups()
('table', None, '.test.test2', '[hello]')
>>> cssSelector.match("table#john.test.test2").groups()
('table', '#john', '.test.test2', None)
>>> cssSelector.match("*#john.test.test2[hello]").groups()
('*', '#john', '.test.test2', '[hello]')
>>> cssSelector.match("*").groups()
('*', None, None, None)
And this one does the attributes (e.g. [link,key~=value]) http://www.rubular.com/r/2L0N5iWPEJ:
attribSelector = re.compile(r'(?P<word>\w+)\s*(?P<operator>[^\w\,]{0,2})\s*(?P<value>\w+)?\s*[\,|\]]')
>>> a = attribSelector.findall("[link, ds9 != test, bsdfsdf]")
>>> for x in a: print x
('link', '', '')
('ds9', '!=', 'test')
('bsdfsdf', '', '')
A couple of things to note:
1) This parses attributes using comma delimitation (since I am not using strict CSS).
2) This requires patterns take the format: tag, id, classes, attributes
The first regex does tokens, so the whitespace and '>' separated parts of a selector string. This is because I wanted to use it to check against my own object graph :)
Thanks again!
[key=value], either using separate lists for key and value, or using an attribute list that contains key-value pairs. And "tag" might be more appropriate than "type".[type],[type^=value],[type$=value], etc, if that matters, such that it may be necessary to store the attribute operator as well.