How to combine csv columns with Python

Question

What I have got is a CSV file with following structure:

column1 column2 column3 column4(day) column5(month&year) column6(time)

...

The columns of the file do not have names. Now I want to merge column4 and column5 and generate a new version of the CSV file so that I have got the complete date in one cell.

What I have tried is following Python code:


def correctDatetime():
    with open("inputfile.csv", "r") as source, open("outputfile.csv", "w") as result:
        df = pd.read_csv('C:/ProgrammingProjects/LicMonTest/inputfile.csv', header=0)

        for row in source:
            df['Datetime'] = df[df.columns[3:]].apply(lambda x: ' '.join(x.dropna().astype(str)), axis=1)
        result.write(df)

Unfortunately this only generates an empty CSV file. How could I best approach this issue? Any advice would be helpful. Thanks in advance!

ParalysisByAnalysis · Accepted Answer · 2019-09-23 14:16:03Z

0

import pandas as pd
#Setup DF
data = [
(1234,    1,     'x', 'x', 34, 'May 2019'),
(1234,    2,     'x', 'x', 34, 'June 2019'),
(1235,    1,     'y', 'y', 34, 'July 2019')]
df = pd.DataFrame(data)

The following code will combine your data based on column position and then drop unnecessary columns and save to some_csv.csv:

df['Datetime']=df.apply(lambda x:'%s %s' % (str(x[df.columns[4]]),str(x[df.columns[5]])),axis=1)
df.drop(columns=[df.columns[4],df.columns[5]], inplace=True)
df.to_csv('some_csv.csv')

answered Sep 23, 2019 at 14:16

ParalysisByAnalysis

7331 gold badge5 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2019-09-23 14:39:33Z

0

Add header=None because no header in csv file to read_csv, then extract column 3 and 4 by DataFrame.pop and join together, last write to file by DataFrame.to_csv:

def correctDatetime():
    df = pd.read_csv('C:/ProgrammingProjects/LicMonTest/inputfile.csv', header=None)
    df['Datetime'] = df.pop(3) + ' ' + df.pop(4) + ' ' + df.pop(5)
    df.to_csv("outputfile.csv", index=False, header=False)

If need convert to datetimes and format of month with years is MMYYYY and time column is HH:MM:SS add to_datetime:

def correctDatetime():
    df = pd.read_csv('C:/ProgrammingProjects/LicMonTest/inputfile.csv', header=None)
    s = df.pop(3) + '-' + df.pop(4) + ' ' + df.pop(4)
    df['Datetime'] = pd.to_datetime(s, format='%d-%m%Y %H:%M:%S')
    df.to_csv("outputfile.csv", index=False, header=False)

edited Sep 23, 2019 at 14:39

answered Sep 23, 2019 at 14:04

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

3 Comments

kev8484 Over a year ago

Shouldn't header=None ? OP said the files don't contain column names

sim_rum Over a year ago

This is unfortunately not working. The provided code does not merge/combine the two columns, it just inserts "0,Datetime" into cell A1.

jezrael Over a year ago

@sim_rum - What is print (df.head()) ?

Collectives™ on Stack Overflow

How to combine csv columns with Python

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related