3

I have three large lists and I want to create a nested dictionary as so:

dic={"gene":{"isoform1":positions1,"isoform2":positions2}, "gene2":{"isoform1:positions1, "isoform2":positions2...etc}

I was able to get the isoforms and positions into one dictionary as so:

Dictionary = dict(zip(Isoform, ExonPos))

However, I don't know how to add the gene name as the key to the dictionary of Isoform and ExonPos

Also, is there a way to use a list as the values to a key?

Kind of like this:

Dictionary = {key:[1,2,3,4,5], key2:[3,5,4]}

Here are my sample lists:

Genes = ['A2M', 'ACADM', 'ACADS', 'ACADVL', 'ACAT1', 'ACVRL1', 'PSEN1', 'ADA', 'SGCA', 'ADRB2', 'ADSL', 'AGA', 'AGT', 'AGXT', 'ALAD', 'ALAS2', 'ABCD1', 'ALDOA', 'ALDOB']

Isoforms = ['NM_000014', 'NM_000016', 'NM_000017', 'NM_000018', 'NM_000019', 'NM_000020', 'NM_000021', 'NM_000022', 'NM_000023', 'NM_000024', 'NM_000026', 'NM_000027', 'NM_000029', 'NM_000030', 'NM_000031', 'NM_000032', 'NM_000033', 'NM_000034']

ExonPos = ['9220303,9220778,9221335,9222340,9223083,9224954,9225248,9227155,9229351,9229941,9230296,9231839,9232234,9232689,9241795,9242497,9242951,9243796,9246060,9247568,9248134,9251202,9251976,9253739,9254042,9256834,9258831,9259086,9260119,9261916,9262462,9262909,9264754,9264972,9265955,9268359,', '76190031,76194085,76198328,76198537,76199212,76200475,76205664,76211490,76215103,76216135,76226806,76228376,', '121163570,121164828,121174788,121175158,121175639,121176082,121176335,121176622,121176942,121177098,', '7123149,7123440,7123782,7123922,7124084,7124242,7124856,7125270,7125495,7125985,7126451,7126962,7127131,7127286,7127464,7127639,7127798,7127960,7128127,7128275,', '107992257,108002633,108004546,108004947,108005868,108009624,108010791,108012331,108013163,108014709,108016928,108017996,', '52301201,52306253,52306882,52307342,52307757,52308222,52309008,52309819,52312768,52314542,', '73603142,73614502,73614674,73637504,73640273,73653560,73659351,73664738,73673093,73678476,73683833,73685841,', '43248162,43248939,43249658,43251228,43251469,43251647,43252842,43254209,43255096,43257687,43264867,43280215,', '48243365,48244728,48244942,48245307,48245734,48246452,48247503,48248000,48252617,48253072,', '148206155,', '40742503,40745835,40749076,40750251,40754867,40755263,40756405,40757276,40757491,40758984,40760279,40760883,40762439,']
6
  • 2
    Put your sample list here Commented Apr 16, 2015 at 14:26
  • Added my sample lists Commented Apr 16, 2015 at 14:30
  • The lists are of different lengths, I'm not sure how isoforms match to genes, genes match to exonpos and isoforms to exonpos. Commented Apr 16, 2015 at 14:35
  • 1
    How are you determining which Gene pairs with which Isoform or ExonPos? Commented Apr 16, 2015 at 14:37
  • ILostMySpoon, your comment made me realize I did something incorrectly. Thank you Commented Apr 16, 2015 at 14:43

4 Answers 4

2

You can use a dict comprehension:

>>> dic={gene:{iso:exon.split(',')} for gene, iso, exon in zip(Genes, Isoforms, ExonPos)}
>>> dic
{'ACADVL': {'NM_000018': ['7123149', '7123440', '7123782', '7123922', '7124084', '7124242', '7124856', '7125270', '7125495', '7125985', '7126451', '7126962', '7127131', '7127286', '7127464', '7127639', '7127798', '7127960', '7128127', '7128275', '']}, 'PSEN1': {'NM_000021': ['73603142', '73614502', '73614674', '73637504', '73640273', '73653560', '73659351', '73664738', '73673093', '73678476', '73683833', '73685841', '']}, 'SGCA': {'NM_000023': ['48243365', '48244728', '48244942', '48245307', '48245734', '48246452', '48247503', '48248000', '48252617', '48253072', '']}, 'ACADM': {'NM_000016': ['76190031', '76194085', '76198328', '76198537', '76199212', '76200475', '76205664', '76211490', '76215103', '76216135', '76226806', '76228376', '']}, 'ACAT1': {'NM_000019': ['107992257', '108002633', '108004546', '108004947', '108005868', '108009624', '108010791', '108012331', '108013163', '108014709', '108016928', '108017996', '']}, 'ADRB2': {'NM_000024': ['148206155', '']}, 'ACADS': {'NM_000017': ['121163570', '121164828', '121174788', '121175158', '121175639', '121176082', '121176335', '121176622', '121176942', '121177098', '']}, 'ACVRL1': {'NM_000020': ['52301201', '52306253', '52306882', '52307342', '52307757', '52308222', '52309008', '52309819', '52312768', '52314542', '']}, 'ADA': {'NM_000022': ['43248162', '43248939', '43249658', '43251228', '43251469', '43251647', '43252842', '43254209', '43255096', '43257687', '43264867', '43280215', '']}, 'ADSL': {'NM_000026': ['40742503', '40745835', '40749076', '40750251', '40754867', '40755263', '40756405', '40757276', '40757491', '40758984', '40760279', '40760883', '40762439', '']}, 'A2M': {'NM_000014': ['9220303', '9220778', '9221335', '9222340', '9223083', '9224954', '9225248', '9227155', '9229351', '9229941', '9230296', '9231839', '9232234', '9232689', '9241795', '9242497', '9242951', '9243796', '9246060', '9247568', '9248134', '9251202', '9251976', '9253739', '9254042', '9256834', '9258831', '9259086', '9260119', '9261916', '9262462', '9262909', '9264754', '9264972', '9265955', '9268359', '']}}

Or, if you want a string vs a list:

>>> dic={gene:{iso:exon} for gene, iso, exon in zip(Genes, Isoforms, ExonPos)}
>>> dic
{'ACADVL': {'NM_000018': '7123149,7123440,7123782,7123922,7124084,7124242,7124856,7125270,7125495,7125985,7126451,7126962,7127131,7127286,7127464,7127639,7127798,7127960,7128127,7128275,'}, 'PSEN1': {'NM_000021': '73603142,73614502,73614674,73637504,73640273,73653560,73659351,73664738,73673093,73678476,73683833,73685841,'}, 'SGCA': {'NM_000023': '48243365,48244728,48244942,48245307,48245734,48246452,48247503,48248000,48252617,48253072,'}, 'ACADM': {'NM_000016': '76190031,76194085,76198328,76198537,76199212,76200475,76205664,76211490,76215103,76216135,76226806,76228376,'}, 'ACAT1': {'NM_000019': '107992257,108002633,108004546,108004947,108005868,108009624,108010791,108012331,108013163,108014709,108016928,108017996,'}, 'ADRB2': {'NM_000024': '148206155,'}, 'ACADS': {'NM_000017': '121163570,121164828,121174788,121175158,121175639,121176082,121176335,121176622,121176942,121177098,'}, 'ACVRL1': {'NM_000020': '52301201,52306253,52306882,52307342,52307757,52308222,52309008,52309819,52312768,52314542,'}, 'ADA': {'NM_000022': '43248162,43248939,43249658,43251228,43251469,43251647,43252842,43254209,43255096,43257687,43264867,43280215,'}, 'ADSL': {'NM_000026': '40742503,40745835,40749076,40750251,40754867,40755263,40756405,40757276,40757491,40758984,40760279,40760883,40762439,'}, 'A2M': {'NM_000014': '9220303,9220778,9221335,9222340,9223083,9224954,9225248,9227155,9229351,9229941,9230296,9231839,9232234,9232689,9241795,9242497,9242951,9243796,9246060,9247568,9248134,9251202,9251976,9253739,9254042,9256834,9258831,9259086,9260119,9261916,9262462,9262909,9264754,9264972,9265955,9268359,'}}
Sign up to request clarification or add additional context in comments.

2 Comments

I just realized that this won't work if there are Isoforms that have the same gene name. Is there any way to overcome this?
If the isoforms are the same but with more than one list entry, what is the result you are looking for?
2

Just zip 2 times and here it is

Dictionary = dict(zip(Genes, [{i[0]: i[1:]} for i in zip(Isoforms, ExonPos)]))


print(Dictionary)

Comments

1

answering your 2nd question. yes you obviously have a dictionary with values as lists look at this

>>> dic = {}
>>> dic = {"key1":[1,2,3]}
>>> dic.update({"key2":[4,5,6]})
>>> dic['key3'] = [7,8,9]
>>> dic
{'key3': [7, 8, 9], 'key2': [4, 5, 6], 'key1': [1, 2, 3]}

answering your 1st question. Since you told that you already have got 2 other lists zipped in a way you want, In a very crude You just have to do something like this

newdictionary = {}
newdictionary.update({gene[index]:zippeddictionary[index]})

Comments

0
result = {}
for gene, iso, exon in zip(Genes, Isoforms, ExonPos):
    result[gene] = {iso: exon.split(',')}

If you don't need to convert comma-seperated list of values from string to list, then use exon instead of exon.split(',').

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.