0

I have a database schema in Postgres that looks like this (in pseudo code):

users (table):
    pk (field, unique)
    name (field)

permissions (table):
    pk (field, unique)
    permission (field, unique)

addresses (table):
    pk (field, unique)
    address (field, unique)

association1 (table):
    user_pk (field, foreign_key)
    permission_pk (field, foreign_key)

association2 (table):
    user_pk (field, foreign_key)
    address_pk (field, foreign_key)

Hopefully this makes intuitive sense. It's a users table that has a many-to-many relationship with a permissions table as well as a many-to-many relationship with an addresses table.

In Python, when I perform the correct SQLAlchemy query incantations, I get back results that look something like this (after converting them to a list of dictionaries in Python):

results = [
    {'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'home'},
    {'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'work'},
    {'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'home'},
    {'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'work'},
    {'pk': 2, 'name': 'John', 'permission': 'user', 'address': 'home'},
]

So in this contrived example, Joe is both a user and and an admin. John is only a user. Both Joe's home and work addresses exist in the database. Only John's home address exists.

So the question is, does anybody know the best way to go from these SQL query 'results' to the more compact 'desired_results' below?

desired_results = [
    {
        'pk': 1,
        'name': 'Joe',
        'permissions': ['user', 'admin'],
        'addresses': ['home', 'work']
    },
    {
        'pk': 2,
        'name': 'John',
        'permissions': ['user'],
        'addresses': ['home']
    },
]

Additional information required: Small list of dictionaries describing the 'labels' I would like to use in the desired_results for each of the fields that have many-to-many relationships.

relationships = [
    {'label': 'permissions', 'back_populates': 'permission'},
    {'label': 'addresses', 'back_populates': 'address'},
]

Final consideration, I've put together a concrete example for the purposes of this question, but in general I'm trying to solve the problem of querying SQL databases in general, assuming an arbitrary amount of relationships. SQLAlchemy ORM solves this problem well, but I'm limited to using SQLAlchemy Core; so am trying to build my own solution.

Update

Here's an answer, but I'm not sure it's the best / most efficient solution. Can anyone come up with something better?

# step 1: generate set of keys that will be replaced by new keys in desired_result
back_populates = set(rel['back_populates'] for rel in relationships)

# step 2: delete from results keys generated in step 1
intermediate_results = [
    {k: v for k, v in res.items() if k not in back_populates}
    for res in results]

# step 3: eliminate duplicates
intermediate_results = [
    dict(t)
    for t in set([tuple(ires.items())
    for ires in intermediate_results])]

# step 4: add back information from deleted fields but in desired form
for ires in intermediate_results:
    for rel in relationships:
        ires[rel['label']] = set([
            res[rel['back_populates']]
            for res in results
            if res['pk'] == ires['pk']])

# done
desired_results = intermediate_results
9
  • 2
    What is the use of 'relationships' list in getting desired result ? Commented Aug 31, 2016 at 0:18
  • Do you have any code or any attempts at all to show us? What can we assume about the relationships and results lists, and can we assume they are totally regular, that all fields are present as shown? Commented Aug 31, 2016 at 0:20
  • They are totally regular. All fields are present as shown. Commented Aug 31, 2016 at 0:29
  • 'label' in the relationships list gives the name of the key in the desired result. 'back_populates' gives the key of results that ends up being in the list of values in desired result Commented Aug 31, 2016 at 0:30
  • there are too many assumptions someone would need to make about your data to post a working solution. Are all of the results set grouped by name? What happens if one of the fields is missing/is that allowed? What are the rules for keys that are not defined in relationships? Commented Aug 31, 2016 at 1:01

1 Answer 1

1

Iterating over the groups of partial entries looks like a job for itertools.groupby.

But first lets put relationships into a format that is easier to use, prehaps a back_populates:label dictionary?

conversions = {d["back_populates"]:d['label'] for d in relationships}

Next because we will be using itertools.groupby it will need a keyfunc to distinguish between the different groups of entries. So given one entry from the initial results, this function will return a dictionary with only the pairs that will not be condensed/converted

def grouper(entry):
    #each group is identified by all key:values that are not identified in conversions
    return {k:v for k,v in entry.items() if k not in conversions}

Now we will be able to traverse the results in groups something like this:

for base_info, group in itertools.groupby(old_results, grouper):
    #base_info is dict with info unique to all entries in group
    for partial in group:
        #partial is one entry from results that will contribute to the final result
        #but wait, what do we add it too?

The only issue is that if we build our entry from base_info it will confuse groupby so we need to make an entry to work with:

entry = {new_field:set() for new_field in conversions.values()}
entry.update(base_info)

Note that I am using sets here because they are the natural container when all contence are unique, however because it is not json-compatible we will need to change them into lists at the end.

Now that we have an entry to build we can just iterate through the group to add to each new field from the original

for partial in group:
    for original, new in conversions.items():
        entry[new].add(partial[original])

then once the final entry is constructed all that is left is to convert the sets back into lists

for new in conversions.values():
    entry[new] = list(entry[new])

And that entry is done, now we can either append it to a list called new_results but since this process is essentially generating results it would make more sense to put it into a generator making the final code look something like this:

import itertools

results = [
    {'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'home'},
    {'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'work'},
    {'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'home'},
    {'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'work'},
    {'pk': 2, 'name': 'John', 'permission': 'user', 'address': 'home'},
]

relationships = [
    {'label': 'permissions', 'back_populates': 'permission'},
    {'label': 'addresses', 'back_populates': 'address'},
]
#first we put the "relationships" in a format that is much easier to use.
conversions = {d["back_populates"]:d['label'] for d in relationships}

def grouper(entry):
    #each group is identified by all key:values that are not identified in conversions
    return {k:v for k,v in entry.items() if k not in conversions}

def parse_results(old_results, conversions=conversions):
    for base_info, group in itertools.groupby(old_results, grouper):
        entry = {new_field:set() for new_field in conversions.values()}
        entry.update(base_info)
        for partial in group: #for each entry in the original results set
            for original, new in conversions.items(): #for each field that will be condensed
                entry[new].add(partial[original])


        #convert sets back to lists so it can be put back into json
        for new in conversions.values():
            entry[new] = list(entry[new])

        yield entry

Then the new_results can be gotten like this:

>>> new_results = list(parse_results(results))
>>> from pprint import pprint #for demo purpose
>>> pprint(new_results,width=50)
[{'addresses': ['home', 'work'],
  'name': 'Joe',
  'permissions': ['admin', 'user'],
  'pk': 1},
 {'addresses': ['home'],
  'name': 'John',
  'permissions': ['user'],
  'pk': 2}]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.