1

So I'm just hitting a mental roadblock on solving this problem and none of the other questions I have looked at have really captured my particular use-case. One was close but I couldn't quite figure out how to tailor it specifically. Basically, I have a script that uses os.walk() and renames any files within a target directory (and any sub-directories) according to user-defined rules. The specific problem is that I am trying to log the results of the operation in JSON format with an output like this:

{
    "timestamp": "2022-12-26 09:40:55.874718",
    "files_inspected": 512,
    "files_renamed": 256,
    "replacement_rules": {
        "%20": "_",
        " ": "_"
    },
    "target_path": "/home/example-user/example-folder",
    "data": [
        {
            "directory": "/home/example-user/example-folder",
            "files": [
                {
                    "original_name": "file 1.txt",
                    "new_name": "file_1.txt"
                },
                {
                    "original_name": "file 2.txt",
                    "new_name": "file_2.txt"
                },
                {
                    "original_name": "file 3.txt",
                    "new_name": "file_3.txt"
                }
            ],
            "children": [
                {
                    "directory": "/home/example-user/example-folder/sub-folder",
                    "files": [
                        {
                            "original_name": "file 1.txt",
                            "new_name": "file_1.txt"
                        },
                        {
                            "original_name": "file 2.txt",
                            "new_name": "file_2.txt"
                        },
                        {
                            "original_name": "file 3.txt",
                            "new_name": "file_3.txt"
                        }
                    ]
                }
            ]
        }
    ]
}

The first item in the 3-tuple (dirpath) begins as the target directory, and on that same loop the second item in the 3-tuple (dirnames) is a list of the directories within that dirpath (if any). However, what I think is messing me up is that on the second loop, dirpath becomes the first item in dirnames in the prior loop (if there were any). I am having trouble working out the logic of transforming this 3-tuple loop data into the nested hierarchy above. Ideally, it would be nice if a directory object which had no sub-directories (children) would also not have the children key at all, but having it set to an empty list would be fine.

I would really appreciate any advice or insight you might have on how to achieve that desired log structure from what os.walk() provides. Also open to any suggestions on improving the JSON object structure. Thank you!

https://github.com/dblinkhorn/file_renamer

4
  • 1
    It would be a lot easier if instead of providing a list of children you had a dict of children keyed by file or directory names; that way you can directly index your way down the structure without needing to do potentially-expensive O(n) searches. Which is to say -- this isn't a well-designed documented format if one of the goals is efficient search for (or updates within) a given subtree. Is the layout an external constraint? Commented Dec 28, 2022 at 22:53
  • Really appreciate that tip and explanation. I'll implement that change. Commented Dec 28, 2022 at 23:01
  • 1
    The structure seems fine to me, depending on what you want to do with it. But I don't see any code here. Commented Dec 28, 2022 at 23:40
  • Sorry, I linked to my GitHub repo which has the current script code, but the logic to construct this JSON object programmatically is just what I have been unable to solve. Commented Dec 29, 2022 at 1:32

2 Answers 2

3

One issue in your approach is that you want a hierarchical result that is most naturally obtained by recursion, whereas os.walk flattens that hierarchy.

For this reason, I would recommend using os.scandir instead. It also happens to be one of the most performant tools to interact with a directory tree.

import os
from datetime import datetime

def rename(topdir, rules, result=None, verbose=False, dryrun=False):
    is_toplevel = result is None
    if is_toplevel:
        result = dict(
            timestamp=datetime.now().isoformat(sep=' ', timespec='microseconds'),
            dryrun=dryrun,
            directories_inspected=0,
            files_inspected=0,
            files_renamed=0,
            replacement_rules=rules,
            target_path=topdir,
        )
    files = []
    children = []
    with os.scandir(topdir) as it:
        for entry in it:
            if entry.is_dir():
                children.append(rename(entry.path, rules, result, verbose, dryrun))
            else:
                result['files_inspected'] += 1
                for old, new in rules.items():
                    if old in entry.name:
                        newname = entry.name.replace(old, new)
                        dst = os.path.join(topdir, newname)
                        if not dryrun:
                            os.rename(entry.path, dst)
                            result['files_renamed'] += 1
                        if verbose:
                            print(f'{"[DRY-RUN] " if dryrun else ""}rename {entry.path!r} to {dst!r}')
                        files.append(dict(original_name=entry.name, new_name=newname))
                        break
    result['directories_inspected'] += 1
    res = dict(directory=topdir)
    if files:
        res.update(dict(files=files))
    if children:
        res.update(dict(children=children))
    if is_toplevel:
        res = result | res
    return res

Example

Let's build a reproducible example:

d = {
    'example/example-folder': [
        'file 1.txt',
        'file 2.txt',
        'foo bar 1.txt',
        {
            'sub/folder': [
                'file 1.txt',
                'file 2.txt',
                'foo bar 1.txt',
            ],
        },
    ],
}

def make_example(d, topdir='.'):
    if isinstance(d, str):
        print(f'make file: {topdir}/{d}')
        with open(os.path.join(topdir, d), 'w') as f:
            pass
    elif isinstance(d, dict):
        for dirname, specs in d.items():
            topdir = os.path.join(topdir, dirname)
            print(f'makedirs {topdir}')
            os.makedirs(topdir, exist_ok=True)
            make_example(specs, topdir)
    else:
        assert isinstance(d, list), f'got a weird spec: {d!r}'
        for specs in d:
            make_example(specs, topdir)

>>> make_example(d)
makedirs ./example/example-folder
make file: ./example/example-folder/file 1.txt
make file: ./example/example-folder/file 2.txt
make file: ./example/example-folder/foo bar 1.txt
makedirs ./example/example-folder/sub/folder
make file: ./example/example-folder/sub/folder/file 1.txt
make file: ./example/example-folder/sub/folder/file 2.txt
make file: ./example/example-folder/sub/folder/foo bar 1.txt
! tree example
example
└── example-folder
    ├── file\ 1.txt
    ├── file\ 2.txt
    ├── foo\ bar\ 1.txt
    └── sub
        └── folder
            ├── file\ 1.txt
            ├── file\ 2.txt
            └── foo\ bar\ 1.txt

3 directories, 6 files

Now, using the rename() function above:

rules = {'%20': '_', ' ': '_'}
res = rename('example', rules, verbose=True, dryrun=True)
# [DRY-RUN] rename 'example/example-folder/file 2.txt' to 'example/example-folder/file_2.txt'
# [DRY-RUN] rename 'example/example-folder/file 1.txt' to 'example/example-folder/file_1.txt'
# [DRY-RUN] rename 'example/example-folder/sub/folder/file 2.txt' to 'example/example-folder/sub/folder/file_2.txt'
# [DRY-RUN] rename 'example/example-folder/sub/folder/file 1.txt' to 'example/example-folder/sub/folder/file_1.txt'
# [DRY-RUN] rename 'example/example-folder/sub/folder/foo bar 1.txt' to 'example/example-folder/sub/folder/foo_bar_1.txt'
# [DRY-RUN] rename 'example/example-folder/foo bar 1.txt' to 'example/example-folder/foo_bar_1.txt'

>>> print(json.dumps(res, indent=4))
{
    "timestamp": "2022-12-29 15:24:06.930252",
    "dryrun": true,
    "directories_inspected": 4,
    "files_inspected": 6,
    "files_renamed": 0,
    "replacement_rules": {
        "%20": "_",
        " ": "_"
    },
    "target_path": "example",
    "directory": "example",
    "children": [
        {
            "directory": "example/example-folder",
            "files": [
                {
                    "original_name": "file 2.txt",
                    "new_name": "file_2.txt"
                },
                {
                    "original_name": "file 1.txt",
                    "new_name": "file_1.txt"
                },
                {
                    "original_name": "foo bar 1.txt",
                    "new_name": "foo_bar_1.txt"
                }
            ],
            "children": [
                {
                    "directory": "example/example-folder/sub",
                    "children": [
                        {
                            "directory": "example/example-folder/sub/folder",
                            "files": [
                                {
                                    "original_name": "file 2.txt",
                                    "new_name": "file_2.txt"
                                },
                                {
                                    "original_name": "file 1.txt",
                                    "new_name": "file_1.txt"
                                },
                                {
                                    "original_name": "foo bar 1.txt",
                                    "new_name": "foo_bar_1.txt"
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
Sign up to request clarification or add additional context in comments.

1 Comment

This is working beautifully. I suspected it would require a recursive solution given that a directory can have n children, but still couldn't quite wrap my head around the logic to implement it. Thank you for taking the time to help me out!
1

I am not totally sure if I got your request right. But to get a simple nested structure from os.walk you could try the following:

import json
import os
from typing import Union

structure = {}

def get_element(dirpath: str) -> Union[dict, None]:
    _element = structure[root]
    if dirpath.startswith(root):
        dirpath = dirpath[len(root)+1:]
    for key in dirpath.split(os.sep):
        try:
            _element = _element[key]
        except KeyError:
            return None
    return _element

target = os.path.abspath('sample')
root, _ = target.rsplit(os.sep, 1)
structure[root] = {}
for path, children, files in os.walk(target):
    element = get_element(path)
    if element is None:
        element = structure[root][path.split(os.sep)[-1]] = {}
    element['files'] = files
    for child in children:
        element[child] = {}

print(json.dumps(structure, sort_keys=True, indent=4))

yields an output like

{
    "/path/to/folder": {
        "sample": {
            "dir": {
                "files": [
                    "more_samples.txt"
                ],
                "subdir": {
                    "files": [
                        "important.txt"
                    ]
                },
                "with": {
                    "children": {
                        "files": [
                            "other.txt",
                            "some.txt"
                        ]
                    },
                    "files": []
                }
            },
            "files": [
                "test.txt"
            ]
        }
    }
}

Does this help?


Note: this is a minimal example, trying to solve the main part of the request. You need to build the rest of your structure around it.


Note 2: The key files might cause a conflict if you have a subfolder names files somewhere. Choose wisely. ;)

2 Comments

Thanks for the comment! When i tested this logic it appears that it creates each subdir object, but does not actually place the subdir data into that object, but instead creates a new root-level object for each subdir.
Shoot. I did not realize that you are working with absolute paths. Updated answer acordingly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.