1

I am using imaplib to extract email messages, and have to extract text from them.

My messages are multipart, so

typ , data = account.fetch(msg_uid , '(RFC822)')
raw_email = data[0][1]
msg = email.message_from_bytes(raw_email)
payload_msg = get_message(msg)

def get_message(message):
    '''
    This function returns an decoded body text of a message, depending on multipart\* or text\*
    :param message: message content of an email
    :return: body of email message
    '''
    body = None
    if message.is_multipart():
        print(str(message.get_content_type()) + ' is the message content type')
        for part in message.walk():
            cdispo = str(part.get('Content-Disposition'))
            if part.is_multipart():
                for subpart in part.walk():
                    cdispo = str(subpart.get('Content-Disposition'))
                    if subpart.get_content_type() == 'text/plain' and 'attachment' not in cdispo:
                        body = subpart.get_payload(decode=True)
                    elif subpart.get_content_type() == 'text/html':
                        body = subpart.get_payload(decode=True)
            elif part.get_content_type() == 'text/plain' and 'attachment' not in cdispo:
                body = part.get_payload(decode=True)
            elif part.get_content_type() == 'text/html' and 'attachment' not in cdispo:
                body = part.get_payload(decode=True)
    elif message.get_content_type() == 'text/plain':
        body = message.get_payload(decode=True)
    elif message.get_content_type() == 'text/html':
        body = message.get_payload(decode=True)
    return body

Now, if you see the above code, msg is the content which we're fetching and passing it to get_payload method, with decode = True. But when I am getting the body and check the type, it still is in bytes! why?

Isn't it supposed to be converted to string?, and the strange thing is when I am giving decode = False, it's in string format! What am I doing wrong here? I'm expecting a vice-versa situation here!

P.S : raw_email is bytes here and msg is some email.message type here!

1 Answer 1

1

According to the docs, the decode flag is not about text encoding, but rather about quoted-printable and base64 encoding. So it's not supposed to change the type of the return value, only its content.

Also, the docs say about the get_payload() method:

This is a legacy method. On the EmailMessage class its functionality is replaced by get_content() and iter_parts().

So you should consider using those methods instead.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.