0

Below is my data in SQL Server

enter image description here

After reading data in python it become like this

enter image description here

I use the below code to split the value to multiple columns

# 1. To split single array column to multiple column based on '\t'
df[['v1','v2','v3','v4','v5','v6','v7','v8','v9','v10','v11','v12','v13','v14','v15',\
       'v16','v17','v18','v19','v20','v21','v22','v23','v24','v25','v26','v27','v28',\
'v29','v30','v31','v32','v33','v34','v35','v36','v37','v38','v39','v40','v41']] = df['_VALUE'].str.split(pat="\t", expand=True)

# 2. To remove the '\r\n' from the last column
df['v41'] = df['v41'].replace(r'\s+|\\n', ' ', regex=True)

But in some data sets the array value is more Eg. 100 columns, the above code is so big. I have to write from V1 to V100. Is there any simple way to do this.

1 Answer 1

1

You can replace the hardcoded array from your code with one that generates the array for you using a method like this:

df[[f'v{x}' for x in range(100)]] = df['_VALUE'].str.split(pat="\t", expand=True)
Sign up to request clarification or add additional context in comments.

6 Comments

1) we can. First of all, I propose to find the maximum number of columns: max(list([x.count('\t")+1 for x in df['_VALUE']])) This string can be entered instead of 100. 2. Here is a challenge just another. Since there will be a different last column for each row, then you will have to change the past solution slightly. Create a column with the number of columns in the line: df['columns_num'] = list([x.count('\t")+1 for x in df['_VALUE']])) And then already replace the desired column df.iloc[i, x].replace(r'\s+|\\n', ' ', regex=True)
max(list([x.count('\t")+1 for x in df['_VALUE']])) - code keeps running so long. i think it searches all the rows in df of column _VALUE
Are there so many lines? You can then try through apply, but I don't remember exactly the syntax. Try to look at an example of a couple of lines, what generally produces list([x.count('\t")+1 for x in df['_VALUE']]
HM. Now I checked it on my data. It works well for me
yes, the data frame has many rows. it's recorded in milliseconds so
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.