0

What I have got is a CSV file with following structure:

column1 column2 column3 column4(day) column5(month&year) column6(time)

column1 column2 column3 column4(day) column5(month&year) column6(time)

column1 column2 column3 column4(day) column5(month&year) column6(time)

...

The columns of the file do not have names. Now I want to merge column4 and column5 and generate a new version of the CSV file so that I have got the complete date in one cell.

What I have tried is following Python code:


def correctDatetime():
    with open("inputfile.csv", "r") as source, open("outputfile.csv", "w") as result:
        df = pd.read_csv('C:/ProgrammingProjects/LicMonTest/inputfile.csv', header=0)

        for row in source:
            df['Datetime'] = df[df.columns[3:]].apply(lambda x: ' '.join(x.dropna().astype(str)), axis=1)
        result.write(df)

Unfortunately this only generates an empty CSV file. How could I best approach this issue? Any advice would be helpful. Thanks in advance!

2 Answers 2

0
import pandas as pd
#Setup DF
data = [
(1234,    1,     'x', 'x', 34, 'May 2019'),
(1234,    2,     'x', 'x', 34, 'June 2019'),
(1235,    1,     'y', 'y', 34, 'July 2019')]
df = pd.DataFrame(data)

The following code will combine your data based on column position and then drop unnecessary columns and save to some_csv.csv:

df['Datetime']=df.apply(lambda x:'%s %s' % (str(x[df.columns[4]]),str(x[df.columns[5]])),axis=1)
df.drop(columns=[df.columns[4],df.columns[5]], inplace=True)
df.to_csv('some_csv.csv')
Sign up to request clarification or add additional context in comments.

Comments

0

Add header=None because no header in csv file to read_csv, then extract column 3 and 4 by DataFrame.pop and join together, last write to file by DataFrame.to_csv:

def correctDatetime():
    df = pd.read_csv('C:/ProgrammingProjects/LicMonTest/inputfile.csv', header=None)
    df['Datetime'] = df.pop(3) + ' ' + df.pop(4) + ' ' + df.pop(5)
    df.to_csv("outputfile.csv", index=False, header=False)

If need convert to datetimes and format of month with years is MMYYYY and time column is HH:MM:SS add to_datetime:

def correctDatetime():
    df = pd.read_csv('C:/ProgrammingProjects/LicMonTest/inputfile.csv', header=None)
    s = df.pop(3) + '-' + df.pop(4) + ' ' + df.pop(4)
    df['Datetime'] = pd.to_datetime(s, format='%d-%m%Y %H:%M:%S')
    df.to_csv("outputfile.csv", index=False, header=False)

3 Comments

Shouldn't header=None ? OP said the files don't contain column names
This is unfortunately not working. The provided code does not merge/combine the two columns, it just inserts "0,Datetime" into cell A1.
@sim_rum - What is print (df.head()) ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.