0

I need to convert formulas from html to Asciidoc. Pandoc is not doing it right because I suppose I need to preprocess them somehow.

The source of the formula is like this:

<div id="d126e133489" class="mediaobject">
<div class="code_responsive">
<p class="programlistingindent">
<span>
<span class="MathEquation" style="font-size: 15px;">
<span class="MathRoot HBox" role="math" aria-label="L indexOf m underscore s a t baseline equals L indexOf m baseline" style="display: inline-block;">
<span class="MathRow HBox" style="display: inline-block; font-size: 15px;">
<span class="MathScript HBox" style="display: inline-block; font-size: 15px;">
<span class="MathRow HBox" style="display: inline-block; font-size: 15px;">
<span class="MathText MathTextBox mwEqnIdentifier">L</span>
</span>
<span class="VBox" style="display: inline-block; text-align: left; vertical-align: -2px;">
<span class="MathRow HBox" style="display: block; font-size: 10.5px; margin-left: 0px; margin-top: 0px;">
<span class="MathText MathTextBox mwEqnIdentifier">m</span>
<span class="MathText MathTextBox mwEqnSymbol">_</span>
<span class="MathText MathTextBox mwEqnIdentifier">sat</span>
</span>
</span>
</span>
<span class="MathText MathTextBox mwEqnSymbol" style="margin-left: 0.277778em;">=</span>
<span class="MathScript HBox" style="display: inline-block; font-size: 15px; margin-left: 0.277778em;">
<span class="MathRow HBox" style="display: inline-block; font-size: 15px;">
<span class="MathText MathTextBox mwEqnIdentifier">L</span>
</span>
<span class="VBox" style="display: inline-block; text-align: left; vertical-align: -2px;">
<span class="MathRow HBox" style="display: block; font-size: 10.5px; margin-left: 0px; margin-top: 0px;">
<span class="MathText MathTextBox mwEqnIdentifier">m</span>
</span>
</span>
</span>
</span>
</span>
</span>
</span>
</p>
</div>
</div>

It is rendered on the page like this:

enter image description here

and I want to get an asciimath or latexmath stem block for this formula as a result.

By default Pandoc is converting it to this:

[[d126e133489]]
[.MathEquation]#[.MathRoot .HBox]#[.MathRow .HBox]#[.MathScript .HBox]#[.MathRow .HBox]#[.MathText .MathTextBox .mwEqnIdentifier]#L##[.VBox]#[.MathRow .HBox]#[.MathText .MathTextBox .mwEqnIdentifier]#m#[.MathText .MathTextBox .mwEqnSymbol]#_#[.MathText .MathTextBox .mwEqnIdentifier]#sat####[.MathText .MathTextBox .mwEqnSymbol]#=#[.MathScript .HBox]#[.MathRow .HBox]#[.MathText .MathTextBox .mwEqnIdentifier]#L##[.VBox]#[.MathRow .HBox]#[.MathText .MathTextBox .mwEqnIdentifier]#m#######

Is there any way of preprocessing such formulas so that I get some valid stem blocks as a result after conversion via Pandoc? Or maybe I can convert them all instead of Pandoc myself but how?

1
  • Is there any chance to get the formulas in a different format? Which software was used to generate the HTML? I believe the only way would be to make pandoc handle this as math would be to hand-code a converter from that format to MathML. Commented Apr 3, 2024 at 12:16

1 Answer 1

0

Pandoc knows how to convert content, but not presentation so much. The HTML implementing the formula is mostly presentation markup.

That particular formula is very straightforward, so it is easy to convert manually:

[latexmath]
++++
L{m\_sat} = L{m}
++++

You likely have other formulas that require more effort to express correctly. Depending on the number of formulas, manual conversion could be notably faster than finding or implementing a tool to perform the conversion.

Markdown is more prevalent than Asciidoc. So I tried a few online HTML->Markdown converters. This one seems promising: https://codebeautify.org/html-to-markdown

It converted your HTML to L m \_ sat \= L m. Re-introducing STEM markup to recreate the rendered formula is still required, but the output seems almost good enough for a first pass.

That approach might be useful if you have hundreds to thousands of formulas to convert.

Sign up to request clarification or add additional context in comments.

2 Comments

Yes in my scenario manual way of converting is not OK because we have a lot of formulas. I put here only one very simple example. Thank you for your tip I will try to use this converter. I already have experience in converting from markdown to asciidoc so if this tool can handle this formula syntax and give me markdown output it will be quite OK for my task.
That is really good converter (from html to markdown) but still the output lacks any wrappers around formulas which is not good. If I convert further from markdown to asciidoc I just cannot add stem-block around formulas because they are not distinguished from regular text :(( If anybody know a converter which can do this and also wrap the formulas I would appreciate the link to it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.