1

I am trying to extract strings from a PE files (.exe & .dll) using pefile library but for a while I am stuck as this type of data format is new to new, I've read many questions similar to mine but with no success I am able to adapt the code to fit my needs.

I have a following code:

# path to random pe file
p = 'dfghdsfhrtkl54165hs.exe'
pe = pefile.PE(p)

# Extract the file's metadata
print('Machine: ', pe.FILE_HEADER.Machine)
print('Number of sections: ', pe.FILE_HEADER.NumberOfSections)
print('Timestamp: ', pe.FILE_HEADER.TimeDateStamp)
print('Entry point: ', pe.OPTIONAL_HEADER.AddressOfEntryPoint)
# Machine:  332
# Number of sections:  3
# Timestamp:  1441263997
# Entry point:  5432

As I understand there are sections that contain .text which can be used to classify if the file is bening or malignant so I've tried the following:

for section in pe.sections:
    if section.Name.decode().strip('\x00') == '.text':
        text_section = section
        break

text_section

Which returns

<Structure: [IMAGE_SECTION_HEADER] 0x1B0 0x0 
Name: .text 0x1B8 0x8 
Misc: 0xF53C 0x1B8 0x8 
Misc_PhysicalAddress: 0xF53C 0x1B8 0x8 
Misc_VirtualSize: 0xF53C 0x1BC 0xC 
VirtualAddress: 0x1000 0x1C0 0x10 
SizeOfRawData: 0x10000 0x1C4 0x14 
PointerToRawData: 0x1000 0x1C8 0x18 
PointerToRelocations: 0x0 0x1CC 0x1C 
PointerToLinenumbers: 0x0 0x1D0 0x20 
NumberOfRelocations: 0x0 0x1D2 0x22 
NumberOfLinenumbers: 0x0 0x1D4 0x24 
Characteristics: 0x60000020>

But I am unsure how to proceed extracting printable strings from this or if this is even the right way.

I've read the following answers: 1 2 3

My end goal is to extract text from PE files that I can use in my ML model as features.

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.