5

Here is a method which tries to get the html part of an email message:

from __future__ import absolute_import, division, unicode_literals, print_function

import email

html_mail_quoted_printable=b'''Subject: =?ISO-8859-1?Q?WG=3A_Wasenstra=DFe_84_in_32052_Hold_Stau?=
MIME-Version: 1.0
Content-type: multipart/mixed;
 Boundary="0__=4EBBF4C4DFD012538f9e8a93df938690918c4EBBF4C4DFD01253"

--0__=4EBBF4C4DFD012538f9e8a93df938690918c4EBBF4C4DFD01253
Content-type: multipart/alternative;
 Boundary="1__=4EBBF4C4DFD012538f9e8a93df938690918c4EBBF4C4DFD01253"

--1__=4EBBF4C4DFD012538f9e8a93df938690918c4EBBF4C4DFD01253
Content-type: text/plain; charset=ISO-8859-1
Content-transfer-encoding: quoted-printable

Freundliche Gr=FC=DFe

--1__=4EBBF4C4DFD012538f9e8a93df938690918c4EBBF4C4DFD01253
Content-type: text/html; charset=ISO-8859-1
Content-Disposition: inline
Content-transfer-encoding: quoted-printable

<html><body>
Freundliche Gr=FC=DFe
</body></html>
--1__=4EBBF4C4DFD012538f9e8a93df938690918c4EBBF4C4DFD01253--

--0__=4EBBF4C4DFD012538f9e8a93df938690918c4EBBF4C4DFD01253--

'''
def get_html_part(msg):
    for part in msg.walk():
        if part.get_content_type() == 'text/html':
            return part.get_payload(decode=True)

msg=email.message_from_string(html_mail_quoted_printable)
html=get_html_part(msg)
print(type(html))
print(html)

Output:

<type 'str'>
<html><body>
Freundliche Gr��e
</body></html>

Unfortunately I get a byte string. I would like to have unicode string.

According to this answer msg.get_payload(decode=True) should do the magic. But it does not in this case.

How to decode a mime part of a message and get a unicode string in Python 2.7?

1 Answer 1

14

Unfortunately I get a byte string. I would like to have unicode string.

The decode=True parameter to get_payload only decodes the Content-Transfer-Encoding wrapper, the =-encoding in this message. To get from there to characters is one of the many things the email package makes you do yourself:

bytes = part.get_payload(decode=True)
charset = part.get_content_charset('iso-8859-1')
chars = bytes.decode(charset, 'replace')

(iso-8859-1 being the fallback in case the message specifies no encoding.)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.