I am trying to extract strings from a PE files (.exe & .dll) using pefile library but for a while I am stuck as this type of data format is new to new, I've read many questions similar to mine but with no success I am able to adapt the code to fit my needs.
I have a following code:
# path to random pe file
p = 'dfghdsfhrtkl54165hs.exe'
pe = pefile.PE(p)
# Extract the file's metadata
print('Machine: ', pe.FILE_HEADER.Machine)
print('Number of sections: ', pe.FILE_HEADER.NumberOfSections)
print('Timestamp: ', pe.FILE_HEADER.TimeDateStamp)
print('Entry point: ', pe.OPTIONAL_HEADER.AddressOfEntryPoint)
# Machine: 332
# Number of sections: 3
# Timestamp: 1441263997
# Entry point: 5432
As I understand there are sections that contain .text which can be used to classify if the file is bening or malignant so I've tried the following:
for section in pe.sections:
if section.Name.decode().strip('\x00') == '.text':
text_section = section
break
text_section
Which returns
<Structure: [IMAGE_SECTION_HEADER] 0x1B0 0x0
Name: .text 0x1B8 0x8
Misc: 0xF53C 0x1B8 0x8
Misc_PhysicalAddress: 0xF53C 0x1B8 0x8
Misc_VirtualSize: 0xF53C 0x1BC 0xC
VirtualAddress: 0x1000 0x1C0 0x10
SizeOfRawData: 0x10000 0x1C4 0x14
PointerToRawData: 0x1000 0x1C8 0x18
PointerToRelocations: 0x0 0x1CC 0x1C
PointerToLinenumbers: 0x0 0x1D0 0x20
NumberOfRelocations: 0x0 0x1D2 0x22
NumberOfLinenumbers: 0x0 0x1D4 0x24
Characteristics: 0x60000020>
But I am unsure how to proceed extracting printable strings from this or if this is even the right way.
I've read the following answers: 1 2 3
My end goal is to extract text from PE files that I can use in my ML model as features.