2

What I'm doing here is converting the omnipage xml to alto xml. So I decided to use C#.

And here is my sample XML file

<wd l="821" t="283" r="1363" b="394">
<ch l="821" t="312" r="878" b="394" conf="158">n</ch>
<ch l="888" t="312" r="950" b="394" conf="158">o</ch>
<ch l="955" t="283" r="979" b="394" conf="158">i</ch>
<ch l="989" t="312" r="1046" b="394" conf="158">e</ch>
<ch l="1051" t="312" r="1147" b="394" conf="158">m</ch>
<ch l="1157" t="283" r="1219" b="394" conf="158">b</ch>
<ch l="1224" t="312" r="1267" b="394" conf="198">r</ch>
<ch l="1267" t="283" r="1296" b="394" conf="198">i</ch>
<ch l="1306" t="312" r="1363" b="394" conf="158">e</ch>
</wd>

And here is my code

XDocument document = XDocument.Load(fileName);
var coordinates = from r in document.Descendants("wd").ToList().Where
                  (r => (string)r.Attribute("l") != "")
                  select new
                  {
                      left = r.Attribute("l").Value,
                  };

foreach (var item in coordinates)
{
    Console.WriteLine(item.left);
}
Console.ReadLine();

My question is, it works when I use a simple XML like in the above, but when I use a long XML like this in the link

http://pastebin.com/LmDHRzC5

it doesn't work?

But it also has a wd tag and it also has a L attribute.

Thank you. I paste the long XML in the pastebin because its too long.

1
  • What do you mean with "it doesn't work"? Is there an error and if yes, which one? Commented Dec 5, 2016 at 7:02

2 Answers 2

2

You have a namespace on your larger document

<document xmlns="http://www.scansoft.com/omnipage/xml/ssdoc-schema3.xsd"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

the following works

document.Descendants().Where(e => e.Name.LocalName == "wd")

Or you can use another option from Search XDocument using LINQ without knowing the namespace

Sign up to request clarification or add additional context in comments.

2 Comments

i have another question i need to get the content of the of each ch tag that are inside the coordinates loop foreach (var item in coordinates) { // what do i need to do here }
@pdftoimage to get their values, chs = r.Descendants().Where(e => e.Name.LocalName == "ch").Select(n => n.Value).ToArray() and then to print them out, Console.WriteLine(item.left + ": " + String.Join(", ", item.chs));
1

I'm not going to do all the code but this should get you started. I used xml linq

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using System.IO;

namespace ConsoleApplication1
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.xml";
        static void Main(string[] args)
        {
            StreamReader reader = new StreamReader(FILENAME);
            //skip xml identification with UTF-16
            reader.ReadLine();
            XDocument doc = XDocument.Load(reader);

            XElement body = doc.Descendants().Where(x => x.Name.LocalName == "body").FirstOrDefault();
            XNamespace ns = body.GetDefaultNamespace();

            var results = new {
                sections = body.Elements(ns + "section").Select(x => new {
                    l = (int)x.Attribute("l"),
                    r = (int)x.Attribute("r"),
                    b = (int)x.Attribute("b"),
                    runs = x.Descendants(ns + "run").Select(y => new {
                        wds = y.Elements(ns + "wd").Select(z => new {
                            chs = z.Elements(ns + "ch").Select(a => new {
                                l = (int?)a.Attribute("l"),
                                t = (int?)a.Attribute("t"),
                                r = (int?)a.Attribute("r"),
                                b = (int?)a.Attribute("b"),
                                conf = (int?)a.Attribute("conf"),
                                value = (string)a
                            }).ToList()
                        }).ToList()
                    }).ToList()
                }).ToList(),
                dds = body.Elements(ns + "dd").Select(x => new {
                    l = (int)x.Attribute("l"),
                    r = (int)x.Attribute("r"),
                    b = (int)x.Attribute("b"),
                    paras = x.Elements(ns + "para").Select(y => new {
                        lns = y.Elements(ns + "ln").Select(z => new {
                            wds = z.Elements(ns + "wd").Select(a => new {
                                chs = a.Elements(ns + "ch").Select(b => new {
                                    l = (int?)b.Attribute("l"),
                                    t = (int?)b.Attribute("t"),
                                    r = (int?)b.Attribute("r"),
                                    b = (int?)b.Attribute("b"),
                                    conf = (int?)b.Attribute("conf"),
                                    value = (string)b
                                }).ToList()
                            }).ToList()
                        }).ToList()
                    }).ToList()
                }).ToList(),

            };
        }
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.