1

I'm using python 3.6. I am trying to read a lot of (.txt) files in multiple directories. Some files have a comma in the file name, e.g. 'Proposal for Anne, Barry and Carol.txt'.

The following code:

for filepath in glob.iglob(params.input_dir + r'\**\**.*', recursive=True):
    # [not shown here: code that filters on .txt filetype]

    with open(filepath) as f:
        for line in f:
            for word in re.findall(r'\w+', line):
                # do stuff

Gives me an error on reading that file:

Traceback (most recent call last):
  File "dir_scraper.py", line 50, in <module>
    results_new = scraper.scrape_file(filepath)
  File "C:\Projects\scraper.py", line 33, in scrape_file
    return func(filepath)
  File "C:\Projects\scraper.py", line 15, in txt
   with open(filepath) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'Z:\\groups\\Proposal for Anne, Barry and Carol.txt'

I do not want to edit the names of the files.

How can I properly read the files with comma's in the filenames?

Edit:

  • I'm sure the path exists.

  • Other files from the same directory are parsed without issues.

  • Trying to open the file directly from the commandline also gives: The system cannot find the path specified.

  • Also, I seem to be unable to rename the file, if I try to change the name through Windows File Explorer to remove the comma (or change something else), it is reset to the original filename.

  • Could it have something to do with file permissions?

  • Or maybe is the filename too long? The full path from Z:[..] to [..].txt is 270 characters long.
13
  • 3
    I cannot reproduce this behavior with Python 3.6.3. Can you show where the variable filepath is set? Commented Nov 20, 2018 at 10:14
  • 1
    Maybe if you use listdir on the directory you can see what the file is actually called. Commented Nov 20, 2018 at 10:15
  • Check the file name correctly, we don't usually need to escape/handle comma names in the file name or any parameter string. Commented Nov 20, 2018 at 10:18
  • Are you sure your path Z:\\groups exists ? Commented Nov 20, 2018 at 10:21
  • I'm sure the path exists. Other files from the same directory are parsed without issues. Directly from the commandline, trying to open the file also gives: The system cannot find the path specified. Also, I seem to be unable to rename the file, if I try to change the name through Windows File Explorer to remove the comma (or change something else), it is reset to the original filename. Commented Nov 20, 2018 at 10:27

2 Answers 2

1

This works fine on Python 3, Windows 10

import glob, re
for filepath in glob.iglob('C:/Users/test-ABC/Desktop/test/' + r'\**\**.*', recursive=True):
    with open(filepath) as f:
        print(f)
        for line in f:
            print(line)
            for word in re.findall(r'\w+', line):
                pass

<_io.TextIOWrapper
name='C:/Users/test-ABC/Desktop/test\\loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong
name\\another
looooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong
name\\test, file, name.txt' mode='r' encoding='cp1251'>

line1 
line2
line3

May be the problem in the long path. Try to check questions like this: Long paths in Python on Windows

Sign up to request clarification or add additional context in comments.

3 Comments

The manifest of Python 3.6+ supports long paths, so if you have "LongPathsEnabled" set in "HKLM\System\CurrentControlSet\Control\FileSystem" in Windows 10, then normalized DOS paths support the native limit of up to about 32760 characters. Otherwise normalized DOS paths use the legacy limit of MAX_PATH (260) characters, and longer paths require an extended local-device path, which is prefixed with "\\?\" (or "\\?\UNC\" for UNC) and must be fully qualified (i.e. not relative) and Unicode.
Thank you @eryksun. Will note that.
Thank you! It turned out that the path was too long, indeed. The comma threw me off. I'll have to look in to how best to support the long path. Thanks @eryksun for the suggestion, I'll see if that works.
0

First, you only work on files, not directories, and second, you can use os.path.join to convert on Windows:

>>>os.path.join("d:\ss")
'd:\\ss'

Try this:

    from pathlib import Path
    import os
    import re
    pathName='./'# r'd:/xx' on windows
    fnLst=list(filter(lambda x:not x.is_dir(),Path(pathName).glob('**/*.txt')))
    print(fnLst)
    for fn in fnLst:
        with open(fn) as f:
            print()
            print(fn)
            for line in f:
                for word in re.findall(r'\w+', line):
                    print(word,end="|")

Output:

[PosixPath('2.txt'), PosixPath('1.txt')]


2.txt
This|tutorial|introduces|the|reader|informally|to|the|basic|concepts|and|features|of|the|Python|language|and|system|It|helps|to|have|a|Python|interpreter|handy|for|hands|on|experience|but|all|examples|are|self|contained|so|the|tutorial|can|be|read|off|line|as|well|
1.txt
Python|is|an|easy|to|learn|powerful|programming|language|It|has|efficient|high|level|data|structures|and|a|simple|but|effective|approach|to|object|oriented|programming|Python|s|elegant|syntax|and|dynamic|typing|together|with|its|interpreted|nature|make|it|an|ideal|language|for|scripting|and|rapid|application|development|in|many|areas|on|most|platforms|

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.