1

I am sorry if this question is already answered, but I did not find any. I want to split & convert long strings in multiple strings I have dataframe df:

       no         strings
1.  A_12_234   gef|re1234|gef|re0943
2.  O_257363   tef|fe4545|tef|fe3333|tef|9995

I want to make individual strings and create new column

output I am getting:

       no         strings                          new_col
1.  A_12_234   gef|re1234|gef|re0943                <thekeys db="gef" value="re1234"/>\n<thekeys db="gef" value="re0943"/>

2.  O_257363   tef|fe4545|tef|fe3333|tef|9995       <thekeys db="tef" value="fe4545"/>\n<thekeys db="tef" value="fe3333"/>

Desired output:

         no         strings                          new_col
1.  A_12_234   gef|re1234|gef|re0943                <thekeys db="gef" value="re1234"/>\n<thekeys db="gef" value="re0943"/>

2.  O_257363   tef|fe4545|tef|fe3333|tef|9995       <thekeys db="tef" value="fe4545"/>\n<thekeys db="tef" value="fe3333"/>\n<thekeys db="tef" value="9995"/>

I dont know where I am making a mistake, since it is skipping some pairs

Here's code:

def createxm(x):
try:
    parsedlist = x['strings'].split('|')
    print(parsedlist)
    cnt = len(parsedlist)/2
    print(cnt)
    xm_list = []
    for i in range(0, int(cnt), 2):
        xm_list.append('<thekeys db="{}" value="{}"/>'.format(parsedlist[i], parsedlist[i+1]))
        xm_string = '\n'.join(xml_list)
    return xm_string
except:
    return None

Thank you

3
  • The output from your code and the desired output are the same. Can you please paste the correct desired output Commented Jun 21, 2021 at 0:46
  • @sharathnatraj hi, they are not same in my output one pair is missing Commented Jun 21, 2021 at 0:49
  • Got it. Sorry, I missed that! Commented Jun 21, 2021 at 1:16

2 Answers 2

1

You were almost there. The problem was in the place where you divide cnt = len(parsedlist/2).

Corrected code:

def createxm(x):
    try:
        parsedlist = x['strings'].split('|')
        print(parsedlist)
        cnt = len(parsedlist)
        print(cnt)
        xm_list = []
        for i in range(0, int(cnt), 2):
            xm_list.append('<thekeys db="{}" value="{}"/>'.format(parsedlist[i], parsedlist[i+1]))
            xm_string = '\n'.join(xm_list)
        return xm_string
    except:
        return None
df['new_col'] = df.apply(lambda x:createxm(x), axis=1)

Prints:

df.new_col.iloc[1]
'<thekeys db="tef" value="fe4545"/>\n<thekeys db="tef" value="fe3333"/>\n<thekeys db="tef" value="9995"/>'
Sign up to request clarification or add additional context in comments.

Comments

0

Just split the values on | then use first four values to get the required string, you can use str.format()

fString = '<thekeys db="{}" value="{}"/>\n<thekeys db={} value="{}"/>'
df['strings'].str.split('|').apply(lambda x: fString.format(x[0], x[1], x[2],  x[3]))

OUTPUT:

1.0    <thekeys db="gef" value="re1234"/>\n<thekeys d...
2.0    <thekeys db="tef" value="fe4545"/>\n<thekeys d...
Name: strings, dtype: object

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.