5

I am trying to convert pandas dataframe column which has list for each row to string in each row but somehow it not converting. Here is what I have tried from other answer.

Code:

import pandas as pd
import numpy as np
data = pd.DataFrame.from_dict(dict, orient = 'index') # save the given data below in dict variable to run this line

first approach:

data['tags'] = data['Tags'].apply(lambda x: ' '.join(map(str, x)))

second approach:

data['tags']=[''.join(map(str,item)) for item in data['Tags']]

but both of these gives me same list of strings in tags column. As shown below,

0    ['python', 'windows', 'pip', 'pygame', 'pycharm']
1                         ['converters', 'dxf', 'dwg']
2                           ['python', 'regex', 'nlp']
3          ['sql', 'join', 'dynamic', 'logic', 'case']
4                   ['r-markdown', 'hugo', 'blogdown']
Name: tags, dtype: object

and I want in this form

0    'python', 'windows', 'pip', 'pygame', 'pycharm'
1                         'converters', 'dxf', 'dwg'
2                           'python', 'regex', 'nlp'
3          'sql', 'join', 'dynamic', 'logic', 'case'
4                   'r-markdown', 'hugo', 'blogdown'
Name: tags, dtype: object

Here is the data (first 5 rows) using data.head(5).to_dict(orient = 'index')

{'Tags': {0: "['python', 'windows', 'pip', 'pygame', 'pycharm']",
  1: "['converters', 'dxf', 'dwg']",
  2: "['python', 'regex', 'nlp']",
  3: "['sql', 'join', 'dynamic', 'logic', 'case']",
  4: "['r-markdown', 'hugo', 'blogdown']"},
 'cleaned_text': {0: 'matter pip version instal specific python version read round still stuck upgrade python 3 7 x python 3 8 1 windows 10 go cmd prompt check pip instal module',
  1: 'convert dwg dxf node php jave etc convert dwg file dxf node js python java already try ogr2ogr success thank advance',
  2: 'match text base string list extract subsection python try generate structure earning call text look like following sample operator lady gentleman thank stand welcome xyz fourth quarter',
  3: 'sql dynamically join table various column first time posting use case want join sale datum master agreement table determine applicable fee transactional level hard part agreement',
  4: 'ok update hugo run 2 year old version hugo academic theme blogdown late version hugo ubuntu 19 10 repos 0 58 new version 0 65 download hugo website'}}

3 Answers 3

5

I think what you're looking for is:

data['tags'] = data['Tags'].apply(lambda x: ' '.join(x))

Example

ser = pd.Series([['python', 'windows', 'pip', 'pygame', 'pycharm'],
                 ['converters', 'dxf', 'dwg'],
                 ['python', 'regex', 'nlp'],
                 ['sql', 'join', 'dynamic', 'logic', 'case'],
                 ['r-markdown', 'hugo', 'blogdown']])

ser.apply(lambda x: ' '.join(x))

will produce

0    python windows pip pygame pycharm
1                   converters dxf dwg
2                     python regex nlp
3          sql join dynamic logic case
4             r-markdown hugo blogdown
dtype: object

If you want it exactly like you show then you can do the following

ser.apply(lambda x: "'" + "', '".join(x) + "'")

which will produce

0    'python', 'windows', 'pip', 'pygame', 'pycharm'
1                         'converters', 'dxf', 'dwg'
2                           'python', 'regex', 'nlp'
3          'sql', 'join', 'dynamic', 'logic', 'case'
4                   'r-markdown', 'hugo', 'blogdown'
dtype: object
Sign up to request clarification or add additional context in comments.

Comments

1

You can try:

data['Tags'] = data['Tags'].str[1:-1]

Output data['Tags']

0    'python', 'windows', 'pip', 'pygame', 'pycharm'
1                         'converters', 'dxf', 'dwg'
2                           'python', 'regex', 'nlp'
3          'sql', 'join', 'dynamic', 'logic', 'case'
4                   'r-markdown', 'hugo', 'blogdown'
Name: Tags, dtype: object

8 Comments

it gives like this... 0 [ ' p y t h o n ' , ' w i n d o w s ' , ' p i p ' , ' p y g a m e ' , ' p y c h a r m ' ] 1 [ ' c o n v e r t e r s ' , ' d x f ' , ' d w g ' ] 2 [ ' p y t h o n ' , ' r e g e x ' , ' n l p ' ] 3 [ ' s q l ' , ' j o i n ' , ' d y n a m i c ' , ' l o g i c ' , ' c a s e ' ] 4 [ ' r - m a r k d o w n ' , ' h u g o ' , ' b l o g d o w n ' ] Name: tags, dtype: object
Looks likes the thing is not list, but list-like string. Try data['Tags'].str[1:-1].
It gives this...and this is surprising to me. 0 'python', 'windows', 'pip', 'pygame', 'pycharm' 1 'converters', 'dxf', 'dwg' 2 'python', 'regex', 'nlp' 3 'sql', 'join', 'dynamic', 'logic', 'case' 4 'r-markdown', 'hugo', 'blogdown' Name: Tags, dtype: object
Can you pls tell what is going on?
As shown in your dict data: "['python', 'windows', 'pip', 'pygame', 'pycharm']" is a string, not a list. What I did was just stripping the first and last characters, which are the square brackets.
|
0

Well, my issue seems like '['a','b','d']' which doesn't appear in dataframe but thanks to @Quang Hoang, he pointed out in my output data which i gave in question. It can also be solved with ast.literal_eval()

import ast
data['tags'] = [ast.literal_eval(item) for item in data['Tags']]

Output:

0    'python', 'windows', 'pip', 'pygame', 'pycharm'
1                         'converters', 'dxf', 'dwg'
2                           'python', 'regex', 'nlp'
3          'sql', 'join', 'dynamic', 'logic', 'case'
4                   'r-markdown', 'hugo', 'blogdown'
Name: tags, dtype: object

Here, literal_eval() evaluate an expression node or a string containing a Python literal or container display. To know more about it and when and how to use refer this question

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.