How to convert string representation of list to a list

Question

I was wondering what the simplest way is to convert a string representation of a list like the following to a list:

x = '[ "A","B","C" , " D"]'

Even in cases where the user puts spaces in between the commas, and spaces inside of the quotes, I need to handle that as well and convert it to:

x = ["A", "B", "C", "D"]

I know I can strip spaces with strip() and split() and check for non-letter characters. But the code was getting very kludgy. Is there a quick function that I'm not aware of?

score 1239 · Accepted Answer · 2023-04-26 03:14:04Z

1239

>>> import ast
>>> x = '[ "A","B","C" , " D"]'
>>> x = ast.literal_eval(x)
>>> x
['A', 'B', 'C', ' D']
>>> x = [n.strip() for n in x]
>>> x
['A', 'B', 'C', 'D']

ast.literal_eval:

Evaluate an expression node or a string containing only a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, None and Ellipsis.

This can be used for evaluating strings containing Python values without the need to parse the values oneself. It is not capable of evaluating arbitrarily complex expressions, for example involving operators or indexing.

edited Apr 26, 2023 at 3:14

answered Dec 12, 2009 at 18:30

Roger Pate

Sign up to request clarification or add additional context in comments.

6 Comments

Paul Kenjora Over a year ago

Per comment below, this is dangerous as it simply runs whatever python is in the string. So if someone puts a call to delete everything in there, it happily will.

user2357112 Over a year ago

@PaulKenjora: You're thinking of eval, not ast.literal_eval.

abarnert Over a year ago

ast.literal_eval is safer than eval, but it's not actually safe. As recent versions of the docs explain: "Warning It is possible to crash the Python interpreter with a sufficiently large/complex string due to stack depth limitations in Python’s AST compiler." It may, in fact, be possible to run arbitrary code via a careful stack-smashing attack, although as far as I know nobody's build a public proof of concept for that.

sqp_125 Over a year ago

Well but what to do if the List does not have quotes? e.g. [4 of B, 1 of G]

ForceBru Over a year ago

@sqp_125, then it's a regular list, and you don't need to parse anything?

|

Ryan · Accepted Answer · 2020-10-30 09:02:00Z

275

The json module is a better solution whenever there is a stringified list of dictionaries. The json.loads(your_data) function can be used to convert it to a list.

>>> import json
>>> x = '[ "A","B","C" , " D"]'
>>> json.loads(x)
['A', 'B', 'C', ' D']

Similarly

>>> x = '[ "A","B","C" , {"D":"E"}]'
>>> json.loads(x)
['A', 'B', 'C', {'D': 'E'}]

edited Oct 30, 2020 at 9:02

user3064538

answered Feb 17, 2016 at 15:39

Ryan

3,2092 gold badges14 silver badges20 bronze badges

5 Comments

Paul Kenjora Over a year ago

This works for ints but not for strings in my case because each string is single quoted not double quoted, sigh.

Skippy le Grand Gourou Over a year ago

As per @PaulKenjora's comment, it works for '["a","b"]' but not for "['a','b']".

Eugene Chabanov Over a year ago

In my case I had to replace single quotes with double quotes in initial string to ensure it works .replace('\'', '"') But I was sure that data inside that string didn't contain any crucial single/double quotes in it that would affect the final result.

Muhammad Yasirroni Over a year ago

If user should only enter list of numeric, I think this is the safest way to go to stop malicious intend user.

Karl Knechtel Over a year ago

The ast.literal_eval approach is more general. For example, JSON cannot handle b prefixes for strings, as it does not recognize a separate bytes type. JSON also requires double quotes for the strings.

Mark Byers · Accepted Answer · 2009-12-12 20:21:43Z

122

The eval is dangerous - you shouldn't execute user input.

If you have 2.6 or newer, use ast instead of eval:

>>> import ast
>>> ast.literal_eval('["A","B" ,"C" ," D"]')
["A", "B", "C", " D"]

Once you have that, strip the strings.

If you're on an older version of Python, you can get very close to what you want with a simple regular expression:

>>> x='[  "A",  " B", "C","D "]'
>>> re.findall(r'"\s*([^"]*?)\s*"', x)
['A', 'B', 'C', 'D']

This isn't as good as the ast solution, for example it doesn't correctly handle escaped quotes in strings. But it's simple, doesn't involve a dangerous eval, and might be good enough for your purpose if you're on an older Python without ast.

edited Dec 12, 2009 at 20:21

answered Dec 12, 2009 at 18:29

Mark Byers

844k202 gold badges1.6k silver badges1.5k bronze badges

2 Comments

Aaryan Dewan Over a year ago

Could you please tell me what why did you say “The eval is dangerous - you shouldn’t execute user input.”? I am using 3.6

Abhishek Vijayan Over a year ago

@AaryanDewan if you use eval directly, it will evaluate any valid python expression, which is potentially dangerous. literal_eval solves this problem by only evaluating Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.

Peter Mortensen · Accepted Answer · 2022-09-23 23:26:45Z

34

Inspired from some of the answers above that work with base Python packages I compared the performance of a few (using Python 3.7.3):

Method 1: ast

import ast

list(map(str.strip, ast.literal_eval(u'[ "A","B","C" , " D"]')))
# ['A', 'B', 'C', 'D']

import timeit
timeit.timeit(stmt="list(map(str.strip, ast.literal_eval(u'[ \"A\",\"B\",\"C\" , \" D\"]')))", setup='import ast', number=100000)
# 1.292875313000195

Method 2: json

import json
list(map(str.strip, json.loads(u'[ "A","B","C" , " D"]')))
# ['A', 'B', 'C', 'D']

import timeit
timeit.timeit(stmt="list(map(str.strip, json.loads(u'[ \"A\",\"B\",\"C\" , \" D\"]')))", setup='import json', number=100000)
# 0.27833264000014424

Method 3: no import

list(map(str.strip, u'[ "A","B","C" , " D"]'.strip('][').replace('"', '').split(',')))
# ['A', 'B', 'C', 'D']

import timeit
timeit.timeit(stmt="list(map(str.strip, u'[ \"A\",\"B\",\"C\" , \" D\"]'.strip('][').replace('\"', '').split(',')))", number=100000)
# 0.12935059100027502

I was disappointed to see what I considered the method with the worst readability was the method with the best performance... there are trade-offs to consider when going with the most readable option... for the type of workloads I use Python for I usually value readability over a slightly more performant option, but as usual it depends.

edited Sep 23, 2022 at 23:26

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered May 1, 2019 at 3:54

kinzleb

1,3351 gold badge11 silver badges9 bronze badges

2 Comments

Is_this_my_username Over a year ago

is there any particular reason for there being a u in front of '[ "A","B","C" , " D"]'

Karl Knechtel Over a year ago

The manual method is simply not as powerful, and does less work, so it's not surprising that it's faster. It will not handle escape sequences in the strings, or a different quote type. (The JSON method demands double-quotes, but does process escape sequences.) It also will only process a flat list of strings; the other approaches can handle complex nested data structures.

Alexei Sholik · Accepted Answer · 2009-12-12 18:24:11Z

33

There is a quick solution:

x = eval('[ "A","B","C" , " D"]')

Unwanted whitespaces in the list elements may be removed in this way:

x = [x.strip() for x in eval('[ "A","B","C" , " D"]')]

answered Dec 12, 2009 at 18:24

Alexei Sholik

7,5092 gold badges34 silver badges41 bronze badges

3 Comments

tosh Over a year ago

this would still preserve the spaces inside the quotes

Nicholas Knight Over a year ago

This is an open invitation to arbitrary code execution, NEVER do this or anything like it unless you know with absolute certainty that the input will always be 100% trusted.

Manish Ranjan Over a year ago

I could use this suggestion because I knew my data was always gonna be in that format and was a data processing work.

tosh · Accepted Answer · 2009-12-12 18:29:02Z

21

import ast
l = ast.literal_eval('[ "A","B","C" , " D"]')
l = [i.strip() for i in l]

answered Dec 12, 2009 at 18:29

tosh

5,4322 gold badges31 silver badges34 bronze badges

Comments

ruohola · Accepted Answer · 2020-03-19 15:58:33Z

16

If it's only a one dimensional list, this can be done without importing anything:

>>> x = u'[ "A","B","C" , " D"]'
>>> ls = x.strip('[]').replace('"', '').replace(' ', '').split(',')
>>> ls
['A', 'B', 'C', 'D']

edited Mar 19, 2020 at 15:58

answered Aug 28, 2018 at 13:02

ruohola

24.8k7 gold badges76 silver badges118 bronze badges

2 Comments

Hassan Kamal Over a year ago

Cautionary note: this could potentially be dangerous if any of the strings inside list has a comma in between.

Ricardo Decal Over a year ago

This will not work if your string list is a list of lists

Peter Mortensen · Accepted Answer · 2022-09-23 23:31:30Z

12

There isn't any need to import anything or to evaluate. You can do this in one line for most basic use cases, including the one given in the original question.

One liner

l_x = [i.strip() for i in x[1:-1].replace('"',"").split(',')]

Explanation

x = '[ "A","B","C" , " D"]'
# String indexing to eliminate the brackets.
# Replace, as split will otherwise retain the quotes in the returned list
# Split to convert to a list
l_x = x[1:-1].replace('"',"").split(',')

Outputs:

for i in range(0, len(l_x)):
    print(l_x[i])
# vvvv output vvvvv
'''
 A
B
C
  D
'''
print(type(l_x)) # out: class 'list'
print(len(l_x)) # out: 4

You can parse and clean up this list as needed using list comprehension.

l_x = [i.strip() for i in l_x] # list comprehension to clean up
for i in range(0, len(l_x)):
    print(l_x[i])
# vvvvv output vvvvv
'''
A
B
C
D
'''

Nested lists

If you have nested lists, it does get a bit more annoying. Without using regex (which would simplify the replace), and assuming you want to return a flattened list (and the zen of python says flat is better than nested):

x = '[ "A","B","C" , " D", ["E","F","G"]]'
l_x = x[1:-1].split(',')
l_x = [i
    .replace(']', '')
    .replace('[', '')
    .replace('"', '')
    .strip() for i in l_x
]
# returns ['A', 'B', 'C', 'D', 'E', 'F', 'G']

If you need to retain the nested list it gets a bit uglier, but it can still be done just with regular expressions and list comprehension:

import re

x = '[ "A","B","C" , " D", "["E","F","G"]","Z", "Y", "["H","I","J"]", "K", "L"]'
# Clean it up so the regular expression is simpler
x = x.replace('"', '').replace(' ', '')
# Look ahead for the bracketed text that signifies nested list
l_x = re.split(r',(?=\[[A-Za-z0-9\',]+\])|(?<=\]),', x[1:-1])
print(l_x)
# Flatten and split the non nested list items
l_x0 = [item for items in l_x for item in items.split(',') if not '[' in items]
# Convert the nested lists to lists
l_x1 = [
    i[1:-1].split(',') for i in l_x if '[' in i
]
# Add the two lists
l_x = l_x0 + l_x1

This last solution will work on any list stored as a string, nested or not.

edited Sep 23, 2022 at 23:31

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Jul 7, 2021 at 21:56

born_naked

7989 silver badges20 bronze badges

2 Comments

Ari Anisfeld Over a year ago

Notice the method doesn't play well with empty lists. You take '[]' and get back ['']. This might be an issue if you're parsing a column in a data frame. Nice solution otherwise!

Banane Over a year ago

the list comprehension seems to bee slower than the x.strip('[]').replace('"', '').split(',') solution. Probably because the strip operation is repeated len(x) times instead of 1 and two list are created instead of 1 (the one returned by the 'split()`and the one returned by the comprehension).

David Beauchemin · Accepted Answer · 2023-03-01 18:28:13Z

10

You can do this

**

x = '[ "A","B","C" , " D"]'
print(eval(x))

** best one is the accepted answer

Though this is not a safe way, the best answer is the accepted one. wasn't aware of the eval danger when answer was posted.

edited Mar 1, 2023 at 18:28

David Beauchemin

2681 gold badge3 silver badges13 bronze badges

answered Jan 28, 2022 at 9:51

Tomato Master

5264 silver badges11 bronze badges

1 Comment

born_naked Over a year ago

eval is not recommended in several places on this thread as it will simple run as code whatever is entered, presenting a security risk. it is also a duplicate answer.

Peter Mortensen · Accepted Answer · 2022-09-23 23:18:30Z

Assuming that all your inputs are lists and that the double quotes in the input actually don't matter, this can be done with a simple regexp replace. It is a bit perl-y, but it works like a charm. Note also that the output is now a list of Unicode strings, you didn't specify that you needed that, but it seems to make sense given Unicode input.

import re
x = u'[ "A","B","C" , " D"]'
junkers = re.compile('[[" \]]')
result = junkers.sub('', x).split(',')
print result
--->  [u'A', u'B', u'C', u'D']

The junkers variable contains a compiled regexp (for speed) of all characters we don't want, using ] as a character required some backslash trickery. The re.sub replaces all these characters with nothing, and we split the resulting string at the commas.

Note that this also removes spaces from inside entries u'["oh no"]' ---> [u'ohno']. If this is not what you wanted, the regexp needs to be souped up a bit.

PaulMcG · Accepted Answer · 2022-05-28 14:28:20Z

5

If you know that your lists only contain quoted strings, this pyparsing example will give you your list of stripped strings (even preserving the original Unicode-ness).

>>> from pyparsing import *
>>> x =u'[ "A","B","C" , " D"]'
>>> LBR,RBR = map(Suppress,"[]")
>>> qs = quotedString.setParseAction(removeQuotes, lambda t: t[0].strip())
>>> qsList = LBR + delimitedList(qs) + RBR
>>> print qsList.parseString(x).asList()
[u'A', u'B', u'C', u'D']

If your lists can have more datatypes, or even contain lists within lists, then you will need a more complete grammar - like this one in the pyparsing examples directory, which will handle tuples, lists, ints, floats, and quoted strings.

edited May 28, 2022 at 14:28

answered Dec 12, 2009 at 21:38

PaulMcG

64.1k16 gold badges98 silver badges135 bronze badges

Comments

dobydx · Accepted Answer · 2020-05-27 18:44:33Z

3

You may run into such problem while dealing with scraped data stored as Pandas DataFrame.

This solution works like charm if the list of values is present as text.

def textToList(hashtags):
    return hashtags.strip('[]').replace('\'', '').replace(' ', '').split(',')

hashtags = "[ 'A','B','C' , ' D']"
hashtags = textToList(hashtags)

Output: ['A', 'B', 'C', 'D']

No external library required.

answered May 27, 2020 at 18:44

dobydx

571 gold badge2 silver badges11 bronze badges

Comments

Hrvoje · Accepted Answer · 2021-04-02 14:58:20Z

2

This usually happens when you load list stored as string to CSV

If you have your list stored in CSV in form like OP asked:

x = '[ "A","B","C" , " D"]'

Here is how you can load it back to list:

import csv
with open('YourCSVFile.csv') as csv_file:
    reader = csv.reader(csv_file, delimiter=',')
    rows = list(reader)

listItems = rows[0]

listItems is now list

edited Apr 2, 2021 at 14:58

answered Apr 1, 2021 at 16:30

Hrvoje

15.4k11 gold badges103 silver badges121 bronze badges

7 Comments

Tomerikoo Over a year ago

Not sure how this is related to the question... list(reader) gives a list of lists. Each inner list is a list of strings of the csv columns. There is no string representation of a list there to begin with...

Hrvoje Over a year ago

@Tomerikoo string representation of list is exactly the same only it's in the file.

Tomerikoo Over a year ago

No. A string representation of a list is "['1', '2', '3']". When you read a csv file with csv.reader, each line is ['1', '2', '3']. That is a list of strings. Not a string representation of a list...

Hrvoje Over a year ago

@Tomerikoo how about you store list in file and than use any method here to restore it.

Tomerikoo Over a year ago

Ok, let's say the csv has literally [1, 2, 3] inside it. Let's say a csv row is [1,2,3] 4 5. Reading it with list(reader) will give [["[1,2,3]", "4", "5"], ...] then doing rows[0] will give ["[1,2,3]", "4", "5"]. Again, I don't see how that answers the question...

|

Peter Mortensen · Accepted Answer · 2022-09-23 23:19:55Z

2

To further complete Ryan's answer using JSON, one very convenient function to convert Unicode is in this answer.

Example with double or single quotes:

>print byteify(json.loads(u'[ "A","B","C" , " D"]')
>print byteify(json.loads(u"[ 'A','B','C' , ' D']".replace('\'','"')))
['A', 'B', 'C', ' D']
['A', 'B', 'C', ' D']

edited Sep 23, 2022 at 23:19

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Apr 27, 2018 at 13:56

CptHwK

1131 silver badge4 bronze badges

1 Comment

Karl Knechtel Over a year ago

The only new information here is a further processing step that is unrelated to the question that was asked, and also somewhere between irrelevant and harmful in most cases. The data generally should be understood as strings (unicode objects in 2.x), not byte sequences.

DINA TAKLIT · Accepted Answer · 2023-03-19 16:07:52Z

2

json.loads() and json.dumps() from json package is the equivalent way of javascript JSON.parse() and JSON.stringify() so use json solution to keep life simpler

import json
a = '[ "A","B","C" , " D"]'
print(json.loads(a)) #['A', 'B', 'C', ' D']
b = ['A', 'B', 'C', ' D']
print(json.dumps(b)) # '["A", "B", "C", " D"]'

answered Mar 19, 2023 at 16:07

DINA TAKLIT

8,50410 gold badges84 silver badges92 bronze badges

Comments

Jordy Van Landeghem · Accepted Answer · 2018-06-01 09:32:00Z

0

I would like to provide a more intuitive patterning solution with regex. The below function takes as input a stringified list containing arbitrary strings.

Stepwise explanation: You remove all whitespacing,bracketing and value_separators (provided they are not part of the values you want to extract, else make the regex more complex). Then you split the cleaned string on single or double quotes and take the non-empty values (or odd indexed values, whatever the preference).

def parse_strlist(sl):
import re
clean = re.sub("[\[\],\s]","",sl)
splitted = re.split("[\'\"]",clean)
values_only = [s for s in splitted if s != '']
return values_only

testsample: "['21',"foo" '6', '0', " A"]"

answered Jun 1, 2018 at 9:32

Jordy Van Landeghem

193 bronze badges

Comments

Peter Mortensen · Accepted Answer · 2022-09-23 23:23:17Z

So, following all the answers I decided to time the most common methods:

from time import time
import re
import json

my_str = str(list(range(19)))
print(my_str)

reps = 100000

start = time()
for i in range(0, reps):
    re.findall("\w+", my_str)
print("Regex method:\t", (time() - start) / reps)

start = time()
for i in range(0, reps):
    json.loads(my_str)
print("JSON method:\t", (time() - start) / reps)

start = time()
for i in range(0, reps):
    ast.literal_eval(my_str)
print("AST method:\t\t", (time() - start) / reps)

start = time()
for i in range(0, reps):
    [n.strip() for n in my_str]
print("strip method:\t", (time() - start) / reps)

    regex method:     6.391477584838867e-07
    json method:     2.535374164581299e-06
    ast method:         2.4425282478332518e-05
    strip method:     4.983267784118653e-06

So in the end regex wins!

Peter Mortensen · Accepted Answer · 2022-09-23 23:24:22Z

0

You can save yourself the .strip() function by just slicing off the first and last characters from the string representation of the list (see the third line below):

>>> mylist=[1,2,3,4,5,'baloney','alfalfa']
>>> strlist=str(mylist)
['1', ' 2', ' 3', ' 4', ' 5', " 'baloney'", " 'alfalfa'"]
>>> mylistfromstring=(strlist[1:-1].split(', '))
>>> mylistfromstring[3]
'4'
>>> for entry in mylistfromstring:
...     print(entry)
...     type(entry)
...
1
<class 'str'>
2
<class 'str'>
3
<class 'str'>
4
<class 'str'>
5
<class 'str'>
'baloney'
<class 'str'>
'alfalfa'
<class 'str'>

edited Sep 23, 2022 at 23:24

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Jan 8, 2019 at 23:24

JCMontalbano

691 silver badge4 bronze badges

Comments

Peter Mortensen · Accepted Answer · 2022-09-23 23:27:50Z

0

And with pure Python - not importing any libraries:

[x for x in  x.split('[')[1].split(']')[0].split('"')[1:-1] if x not in[',',' , ',', ']]

edited Sep 23, 2022 at 23:27

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Jul 24, 2019 at 15:23

Ioannis Nasios

8,5474 gold badges41 silver badges59 bronze badges

Comments

Shahin Shirazi · Accepted Answer · 2023-04-04 19:31:40Z

0

This is another solution if you don't want to import any library:

x = '[ "A","B","C" , " D"]'
def toList(stringList):
  stringList = stringList.split('[')[1]# removes "["
  stringList = stringList.split(']')[0]# removes "]"
  stringList = stringList.split(',')#gets objects in the list
  return [text.strip()[1:-1] for text in stringList] #eliminate additional " or ' in the string.
toList(x)

Output:

['A', 'B', 'C', ' D']

The caveat to this method is that it doesn't work if you have comma inside your string for example if your input is

x = '[ "A","B,F","C" , " D"]'

your output will be

['A', '', '', 'C', ' D']

which is not what you want.

answered Apr 4, 2023 at 19:31

Shahin Shirazi

4394 silver badges18 bronze badges

Comments

tklodd · Accepted Answer · 2025-01-24 18:28:50Z

Here is an implementation of a function to convert a string to a list that isn't very short, but it is very simple and straightforward and does exactly what you would expect it to and what you would do if you were doing this manually:

def string_to_list(value):
    assert(isinstance(value, (str, list)))
    if isinstance(value, list):
        return value
    assert(value.startswith("[") and value.endswith("]"))
    value = value.strip().removeprefix("[").removesuffix("]").split(",")
    for i, item in enumerate(value):
        item = item.strip()
        if item.startswith("'") and item.endswith("'"):
            item = item.removeprefix("'").removesuffix("'")
        elif item.startswith('"') and item.endswith('"'):
            item = item.removeprefix('"').removesuffix('"')
        value[i] = item
    return value

Peter Mortensen · Accepted Answer · 2022-09-23 23:33:10Z

-1

This solution is simpler than some I read in the previous answers, but it requires to match all features of the list.

x = '[ "A","B","C" , " D"]'
[i.strip() for i in x.split('"') if len(i.strip().strip(',').strip(']').strip('['))>0]

Output:

['A', 'B', 'C', 'D']

edited Sep 23, 2022 at 23:33

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Oct 28, 2021 at 13:35

CassAndr

114 bronze badges

Collectives™ on Stack Overflow

How to convert string representation of list to a list

22 Answers 22

6 Comments

5 Comments

2 Comments

2 Comments

3 Comments

Comments

2 Comments

One liner

Explanation

Nested lists

2 Comments

1 Comment

Comments

Comments

Comments

7 Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

22 Answers 22

6 Comments

5 Comments

2 Comments

2 Comments

3 Comments

Comments

2 Comments

One liner

Explanation

Nested lists

2 Comments

1 Comment

Comments

Comments

Comments

7 Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Related