I'm running Python 3.5.1 on Windows. I am attempting to find duplicate source code files in a directory by computing their hash. The problem is that Python seems to think some files are empty. Here is the relevant code snippet:
with open(path, 'rb') as afile:
hasher = hashlib.md5()
data = afile.read()
hasher.update(data)
print("len(data): {}, Path: {}, Hash:{}".format(len(data), path, hasher.hexdigest()))
Here is some example output:
len(data): 0, Path: h:\t\TCPServerSocket.h, Hash:d41d8cd98f00b204e9800998ecf8427e
len(data): 0, Path: h:\t\TCPSocket.cpp, Hash:d41d8cd98f00b204e9800998ecf8427e
len(data): 0, Path: h:\t\TCPSocket.h, Hash:d41d8cd98f00b204e9800998ecf8427e
len(data): 5073, Path: h:\t\ConfigFile.cpp, Hash:6188d6a0e0bc02edf27ce232689beff6
I assure you that these files are not empty, and Python is not throwing any errors during execution. Any ideas?
os.pathfunctions?os.pathfunctions. Python is accessing the files fine, it just thinks that they are empty. I can open the files in an editor without issue as well.data:, but the output islen(data):.