0

net and C#. I need to write a program to browse and read an excel and then parse it in specified format and finally insert into sql server database.

  1. I have used oledb to read excel and I created DataTable from excel. Now I'm having a trouble to parse it in required format. Here is the link for the picture of what is excel input and what is expected format to insert into database. Input and expected output format

  2. Right now I'm doing with simple data in future I need to do in for large excel data around (3000 columns) to parse into some 250000 records. Please also give me advise in terms of performance wise. Right now I'm using oledb is it fine or do I need to use anything else.

Here is my sample code c# code file

    OleDbConnection Econ;
    SqlConnection con;

    string constr, Query, sqlconn;
    protected void Page_Load(object sender, EventArgs e)
    {


    }

    // excel connection
    private void ExcelConn(string FilePath)
    {

        constr = string.Format(@"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=""Excel 12.0 Xml;HDR=YES;""", FilePath);
        Econ = new OleDbConnection(constr);

    }

    // sql connection
    private void connection()
    {
        sqlconn = ConfigurationManager.ConnectionStrings["SqlCom"].ConnectionString;
        con = new SqlConnection(sqlconn);

    }

    // read data from excel and creating a datatable
    private void ExcelToDataTable(string FilePath)
    {
        ExcelConn("C:\\Users\\username\\Desktop\\EmpEx.xlsx");

        Query = string.Format("Select * FROM [Sheet1$]");
        OleDbCommand Ecom = new OleDbCommand(Query, Econ);
        Econ.Open();

        OleDbDataAdapter oda = new OleDbDataAdapter(Ecom);
        DataTable dtExcel = new DataTable();
        Econ.Close();
        oda.Fill(dtExcel);

       // DataTable parseTable = ParseDataTable(dtExcel);

        //connection();


        // printing data table
        foreach (DataRow dataRow in dtExcel.Rows)
        {
            foreach (var item in dataRow.ItemArray)
            {
                Response.Write(item);
            }
        }

        Response.Write("<br> Colums: " + dtExcel.Columns.Count.ToString() + "<br>");
        Response.Write("Rows: " + dtExcel.Rows.Count.ToString() + "<br>");
        //print on screen
        foreach(DataRow row in dtExcel.Rows)
        {

            foreach(DataColumn col in dtExcel.Columns)
            {
                Label1.Text = Label1.Text  + row[col].ToString() + "\t";

            }


        }
    }

    // Method to make data table in specified format
    public DataTable ParseDataTable(DataTable dtExcel)
    {
        var dt = new DataTable("sourceData");
        dt.Columns.Add(new DataColumn("id", typeof(String)));
        dt.Columns.Add(new DataColumn("name", typeof(String)));
        dt.Columns.Add(new DataColumn("variable", typeof(String)));
        dt.Columns.Add(new DataColumn("year", typeof(String)));
        dt.Columns.Add(new DataColumn("value", typeof(String)));

        // NOT GETTING TO PARSE In specified format
        /**** NEED HELP HERE *****/

        return dt;
    }


    protected void Button1_Click(object sender, EventArgs e)
    {
        string CurrentFilePath = Path.GetFullPath(FileUpload1.PostedFile.FileName);
        ExcelToDataTable(CurrentFilePath);
    }  

Please help me how can I achieve this. How can I parse input excel data in specified format as mentioned in the attached picture in the link (screenshot). Please suggest me any way to fix my problem.

3 Answers 3

1

I solved this problem with using C# OLEDB ACE engine. Currently it supports only around 250 columns. It satisfies my requirement so far.

Solution is I'm able to get the sheet name and sheet range through code for the input file. I copied input file into a C# oledb datatable inputtable , using that datatable I created another formatted datatable which holds the values from inputtable based on conditional logic. I used linq to query the datatable in order to generate the formatted result.

on button click:

       string rangeStringwithSHeet = sheetName + excelComm.GetRange(sheetName, excelConn);

        dataQuery = string.Format("SELECT Institution" + queryIn + "  FROM [{0}] ", rangeStringwithSHeet);

        // connect to excel with query and get the initiall datatable from excel input
        dataExcelTable = excelComm.FillDataTableWithQuery(dataQuery, excelConn);
         formattedDataTableExcel(dataExcelTable);

The actual conversion logic I included in formattedDataTableExcel() method, where I created this for my web application. I wrote logic according to my business logic. I'm not posting the actual logic here. If anyone have a similar issue let me know I can help with the conversion process.

Sign up to request clarification or add additional context in comments.

Comments

0

My recommendation would be to re-think your tool. This would be much easier in a tool like SQL Server Integration Services (SSIS) or other tools whose sole purpose is this.

From the SSIS Wiki article, "SSIS is a platform for data integration and workflow applications."

From the C# Wiki article "C# (pronounced as see sharp) is a multi-paradigm programming language".

1 Comment

I tried Talend open source ETL tool (data integration ) for this purpose. When I have more than 1000 columns in my input excel sheet the tool not detecting my excel input. I searched online for formatting with SSIS but I didn't find any close match to my requirement. I appreciate if you can suggest me any web url or sample. Thank you.
0

I have created a solution for unpivoting the data in F# which can be found here. Since F# works on the .NET CLR you could call this from C# or could translate it to C# using linq equivalent operations.

// Sample Input as a jagged array
let sampleInput =
    [| [| "id"; "name"; "variable1"; "variable1"; "variable2" |]
       [| ""; ""; "Fall 2000"; "Fall 2001"; "Fall 2000" |]
       [| "1"; "abc"; "1400"; "1500"; "1200" |]
       [| "2"; "xyz"; "1200"; "1400"; "1100" |] |]

let variables = sampleInput.[0].[2 ..]
let falls = sampleInput.[1].[2 ..]
let idNameValues = sampleInput.[2 ..] |> Array.map (fun value -> (value.[0], value.[1], value.[2 ..]))

// Output as an array of tuples
let output =
    idNameValues
    |> Array.collect (fun (id, name, values) -> 
        Array.zip3 variables falls values // Zip up the variables, falls and values data arrays for each old id, name combination
        |> Array.mapi (fun i (variable, fall, value) -> (i, int id, name, variable, fall, value)) // Flatten out over the row id, old id index and name
    )
    |> Array.sortBy (fun (row, id, _, _, _, _) -> (row, id)) // Sort by row id and old id index
    |> Array.mapi (fun i (_, _, name, variable, fall, value) -> (i + 1, name, variable, fall, int value)) // Add new id index

printfn "SampleInput =\n %A" sampleInput                 
printfn "Output =\n %A" output

I have actually had a go at translating the F# code to C#. I am sure you could probably write more idiomatic C# code here and performance is probably lacking a bit too with the massive amounts of linq but it seems to work!

You can see it working in .NET Fiddle here.

using System;
using System.Linq;

public class Program
{   
    public static string[][] SampleInput()
    {
        return new string[][]{ 
            new string[] { "id", "name", "variable1", "variable1", "variable2" },
            new string[] { "", "", "Fall 2000", "Fall 2001", "Fall 2000" },
            new string[] { "1", "abc", "1400", "1500", "1200" },
            new string[] { "2", "xyz", "1200", "1400", "1100" } 
        };
    }

    public static Tuple<int, string, string, string, int>[] Unpivot(string[][] flattenedInput)
    {
        var variables = (flattenedInput[0]).Skip(2).ToArray();
        var falls = (flattenedInput[1]).Skip(2).ToArray();
        var idNameValues = flattenedInput.Skip(2).Select(idNameValue => Tuple.Create(idNameValue[0], idNameValue[1], idNameValue.Skip(2))).ToArray();

        return
            idNameValues
                .SelectMany(idNameValue => variables
                    .Zip(falls, (variable, fall) => Tuple.Create(variable, fall))
                    .Zip(idNameValue.Item3, (variableFall, val) => Tuple.Create(variableFall.Item1, variableFall.Item2, val))
                    .Select((variableFallVal, i) => Tuple.Create(i + 1, Convert.ToInt32(idNameValue.Item1), idNameValue.Item2, variableFallVal.Item1, variableFallVal.Item2, variableFallVal.Item3))
                )
                .OrderBy(rowId_ => Tuple.Create(rowId_.Item1, rowId_.Item2))
                .Select((_NameVariableFallValue, i) => Tuple.Create(i + 1, _NameVariableFallValue.Item3, _NameVariableFallValue.Item4, _NameVariableFallValue.Item5, Convert.ToInt32(_NameVariableFallValue.Item6)))
                .ToArray();
    }

    public static void Main()
    {
        var flattenedData = SampleInput();
        var normalisedData = Unpivot(SampleInput());

        Console.WriteLine("SampleInput =");
        foreach (var row in SampleInput())
        {
            Console.WriteLine(Tuple.Create(row[0], row[1], row[2], row[3], row[4]).ToString());
        }       

        Console.WriteLine("\nOutput =");
        foreach (var row in normalisedData)
        {
            Console.WriteLine(row.ToString());
        }
    }
}

Edit: Below is an example of converting an excel file represented by a file path to a jagged string array. In this case I have used the Nuget Package ExcelDataReader to get the data from Excel.

using System;
using System.IO;
using System.Data;
using System.Collections.Generic;
using System.Linq;
using Excel;  // Install Nuget Package ExcelDataReader

public class Program
{
    public static string[][] ExcelSheetToJaggedArray(string fileName, string sheetName)
    {
        using (var stream = File.Open(fileName, FileMode.Open, FileAccess.Read))
        {
            using (var excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream))
            {
                var data =
                    excelReader.AsDataSet().Tables
                        .Cast<DataTable>()
                        .FirstOrDefault(sheet => sheet.TableName == sheetName);

                return
                    data.Rows
                        .Cast<DataRow>()
                        .Select(row => 
                            row.ItemArray 
                                .Select(cell => cell.ToString()).ToArray())
                        .ToArray();
            }
        }
    }

    public static void Main()
    {
        // Sample use of ExcelSheetToJaggedArray function
        var fileName = @"C:\SampleInput.xlsx";
        var jaggedArray = ExcelSheetToJaggedArray(fileName, "Sheet1");

        foreach (var row in jaggedArray)
        {
            foreach (var cell in row)
            {
                Console.Write(cell.ToString() + ",");
            }
            Console.WriteLine();
        }
    }
}

2 Comments

Thank you. But here I have to read data from excel not from array. Can it be possible in F#. I'm new to C# and Asp.net the code for F# is looks new to me. Though I highly appreciate your initial and quick reply.
@priya777 yeah I was trying to focus in on the step from converting the flattened data to the normalized data. I don't think it would be too hard for you for example to convert from Excel to a jagged array equivalent. Rather than trying to do everything in one step/method it can be useful to break the logic into 2 or 3. That is Excel -> Excel datatable -> flattened jagged array -> normalized array -> sql server

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.