1

Currently I have CSV along these lines :

"NAME","AGE","SEX"
"FRED, JONES","45","MALE"
"SALLY, SMITH","60","FEMALE"

And I use the following code to serialize it into JSON :

var linesCSV = System.IO.File.ReadAllLines(targetFile); //target file is the csv

var csv = linesCSV.Select(l => l.Split(',')).ToList();

var headers = csv[0];
var dicts = csv.Skip(1).Select(row => Enumerable.Zip(headers, row, System.Tuple.Create).ToDictionary(p => p.Item1, p => p.Item2)).ToArray();

string json = new System.Web.Script.Serialization.JavaScriptSerializer().Serialize(dicts);

jsWrtr.WriteLine(json);

This gets outputted like so :

[{
  "\NAME\"" : "\"FRED\"",
  "\AGE\"" : "\"JONES\"",
  "\SEX\"" : "\"45\""
},
{
  "\NAME\"" : "\"SALLY\"",
  "\AGE\"" : "\"SMITH\"",
  "\SEX\"" : "\"60\""
}]

You can see the NAME gets split up and the second part, part after the comma, gets put into the next field.

This is obviously because of the comma inbetween, but my question is how do I just parse the CSV so it outputs the following :

[{
   "NAME" : "FRED, JONES",
   "AGE" : "45",
   "SEX" : "MALE"
 },
 {
   "NAME" : "SALLY, SMITH",
   "AGE" : "60",
   "SEX" : "FEMALE"
 }]
2
  • You are splitting the fields by a comma, but should you not be doing so by double quotes ? Commented May 25, 2016 at 15:03
  • You can use CsvHelper library with custom map. Not the simplest solution but it has some advantages. Commented May 25, 2016 at 15:25

2 Answers 2

1

You can split by "," instead, plus trimming the input string by ".

    List<string> lines = new List<string>
{
    "\"NAME\", \"AGE\", \"SEX\"",
    "\"FRED, JONES\", \"45\", \"MALE\"",
    "\"SALLY, SMITH\", \"60\", \"FEMALE\""
};

    foreach (var line in lines.Skip(1))
    {
        var fields = line.Trim(new char[] { '"' }).Split(new string[] { "\", \"" }, StringSplitOptions.None);

        foreach (var field in fields)
            Console.WriteLine(field.Trim());

        Console.WriteLine();
    }

This will extract the fields properly, and you can move on to the json serialization.

enter image description here

Update:

Here's an update for the json serialization, giving you an output like you want:

    foreach (var line in lines.Skip(1))
    {
        var fields = line.Trim(new char[] { '"' }).Split(new string[] { "\", \"" }, StringSplitOptions.None);

        Entry entry = new Entry { Name = fields.FirstOrDefault(), Age = fields.Skip(1).FirstOrDefault(), Sex = fields.LastOrDefault() };
        results.Add(entry);
    }

    var json = JsonConvert.SerializeObject(results);

Note that for simplicity I created a class named Entry that contains 3 strings, one for each field, but you may want to use different types (and will then need to properly parse the values).

Note that I use Newtonsoft's Json nuget library for serialization - you seem to be using something else. Unless you need to stick with your library, I recommend the widely used Newtonsoft.

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Managed to finish it off :)Thanks a lot for the help
1

As a work around you could split on ", " and trim the remaining double quotes where necessary. This should leave FRED, JONES as a single entity in the split. You would have to add the quotes back on if they were required however.

1 Comment

but this does work in cases like Sally,Smith (all part of the same field)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.