1

I have a folder with a large data history

data/
  2010.01.01/
    f1/
    f2/
    ...
  2010.01.02/
    ...
  ...

and I would like to have another folder with just the folders that have a name with a date in the last 90 days

data_recent/
  2020.02.28/
    f1/
    f2/
    ...
  ...  
  2020.05.28/
    ...

what is the easiest way I can sync the new folders and delete the old ones with a bash script? The box is running on centos7

3
  • 3
    What I would do is just create soft-links to the folders with name_dates less than 90 days old in data_recent/. (ln -s). Then it's just a matter of looping over the new directories in data/ and adding new links in data_recent/ and looping over the links in data_recent/ and removing links for any older than 90-days. In either case you can parse the directory name and then create a date (in seconds since epoch) with date -d "folder_date" +%s. You get the number of seconds for 90 days ago with date -d "90 days ago" +%s Commented May 28, 2020 at 5:21
  • the age of the data does not correspond necessarily to the date as in folder name. so could some data be re-generated later (and so the main folder would be deleted and recreated). Commented May 28, 2020 at 5:49
  • 1
    That's why you parse the folder name into a date instead of using find data/ -type f -newermt "$(date -d "90 days ago" "+%F %R")" :-) Commented May 28, 2020 at 6:12

1 Answer 1

3

The key is to convert the date folder names into Unix Epoch time so you can easily compare them.

#!/bin/bash

dataDir="/abs/path/to/data"
recentDir="/abs/path/to/data_recent"
daysToKeep=90
minKeepEpoch=$(date --date "$daysToKeep days ago" +%s)

# Create new links for folders that are within $daysToKeep
while IFS= read -r -d $'\0' dir; do
  dirName=${dir##*/}
  dirEpoch=$(date --date ${dirName//./} +%s)
  (( dirEpoch >= minKeepEpoch )) && ln -s -t "$recentDir" "$dir"
done < <(find "$dataDir" -mindepth 1 -maxdepth 1 -type d -print0)

# Remove links that are older than $daysToKeep    
while IFS= read -r -d $'\0' link; do
  linkName=${link##*/}
  linkEpoch=$(date --date ${linkName//./} +%s)
  (( linkEpoch < minKeepEpoch )) && rm "$link"
done < <(find "$recentDir" -mindepth 1 -maxdepth 1 -type l -print0)

Proof of Concept

Note that ./data_recent was pre-populated with an outdated link that will be removed

$ tree ./data
./data
├── 2010.01.01
│   ├── f1
│   └── f2
├── 2020.02.27
│   ├── f1
│   └── f2
├── 2020.02.28
│   ├── f1
│   └── f2
├── 2020.05.27
└── 2020.05.28
    └── f1

12 directories, 0 files

$ tree ./data_recent/
./data_recent/
└── 2010.01.01 -> /abs/path/to/data/2010.01.01

1 directory, 0 files

$ ./syncFolders.sh
$ tree ./data_recent/
./data_recent/
├── 2020.05.27 -> /abs/path/to/data/2020.05.27
└── 2020.05.28 -> /abs/path/to/data/2020.05.28

2 directories, 0 files
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot for that great answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.