I have a dataframe with several columns of user information where I have the columns "Contact 1" and "Contact 2".
d= {'Contact 1': ['1234567891 1234567891', '12345678 12345678', '12345678 1234567891', '1234567891 12345678','1234567 1234567891',
'1234567891','123456789 12345678911', None],
'Contact 2': [None, None, None, None, None, '12345678', None, None]}
df = pd.DataFrame(data=d)
| Contact 1 | Contact 2 |
|---|---|
| 1234567891 1234567891 | None |
| 12345678 12345678 | None |
| 12345678 1234567891 | None |
| 1234567891 12345678 | None |
| 1234567 1234567891 | None |
| 1234567891 | 12345678 |
| 123456789 12345678911 | None |
| None | None |
I want to split the "Contact 1" column based on the space between numbers only if the contact numbers are 8 or 10 digits followed by space, then 8 or 10 digits. This while also preserving the few information I have on "Contact 2" column.
I tried the following code:
df[['Contact 1', 'Contact 2']]=df['Contact 1'].str.split(r'(?<=^\d{8}|\d{10})\s(?=\d{8}|\d{10}$)', n=1, expand=True)
but I get the error "re.error: look-behind requires fixed-width pattern"
I would like to get the following result:
| Contact 1 | Contact 2 |
|---|---|
| 1234567891 | 1234567891 |
| 12345678 | 12345678 |
| 12345678 | 1234567891 |
| 1234567891 | 12345678 |
| 1234567 1234567891 | None |
| 1234567891 | 12345678 |
| 123456789 12345678911 | None |
| None | None |
12345678 1234567891's Contact 2 column got value after processing?