2

I'm a novice bash scripter. Need help with the following. I have a script that uses inotifywait to watch a directory. Another process will drop a csv file in nightly. The file will be of the format filename_YYYYMMDD.csv. My script takes the oldest timestamped file and renames it filename.csv, so it's ready to be consumed by another external process. In a perfect world, one files comes in, gets renamed without timestamp, and is processed. The external process moves the file to an archived folder. If the external process is down, the timestamped files will accumulate in the folder. My script renames the first file when it hits the directory, but the rest will go untouched until that first file is archived. Then my script finds the oldest file, renames it, and exits.

When the timestamped file first hits the directory, my script checks for a timestamp in the name:

regex='_[0-9]{4}[0-9]{2}[0-9]{2}(.csv)$'
if [[ $file =~ $regex ]]; then

This works fine. I check for a file already existing named filename.csv. If none, this file gets renamed.

When a file is processed and archived, the script detects that file is gone, gets the basename and looks for the next oldest file matching that basename, renames it, and exits.

Finally, here's where I need help. How can I modify the below code to look for the oldest file matching a pattern? If I just look for basename, I pick up a malformed filename, like filename_YYMM, which I don't want.

unset -v oldest
for f in $mybasename*; do
   [[ -z $oldest || $f -ot $oldest ]] && oldest=$f
done

Right now $mybasename is just filename. How can I get the for to use a regex like:

regex='$mybasename_[0-9]{4}[0-9]{2}[0-9]{2}(.csv)$'
for f in $regex; do
   [[ -z $oldest || $f -ot $oldest ]] && oldest=$f
done

Thanks!

5
  • 1
    zsh has built-in functionality to find oldest/newest files matching a pattern; is that an option for you? Commented Aug 10, 2021 at 17:12
  • You're doing this the hard way! Use a soft link (man ln) to point to the file to process (ln -s filename_YYMMDD.csv filename.csv), use stat (man stat) output the filename and other data, sort man sort, cut off (man cut) the other data, and move the link along (delete and re-create the link). Commented Aug 10, 2021 at 17:49
  • Why do you need to rename it? Why not just process it and move it to another directory (the archive dir)? Commented Aug 11, 2021 at 2:52
  • An external process is predefined to read a static filename, which doesn't include the timestamp. Commented Aug 11, 2021 at 21:29
  • I'd use a symlink instead of renaming the file. that way you get to keep the date in the filename in the archive dir. Commented Aug 12, 2021 at 6:19

3 Answers 3

1

It would be a lot easier in zsh whose globs are functionally equivalent to regexps (though with different syntax):

set -o extendedglob
base=filename
oldest=( ${base}_[0-9](#c8).csv(N-Om[1]) )
  • x(#c8) is the equivalent of ERE x{8} (or ksh93's {3}(x))
  • Nullglob for the glob to expand to nothing if there's no match. So $oldest is an array that can have either 0 or 1 element.
  • Om Order by modification time from oldest to newest (capital O for reverse sort).
  • With -, that check is done after symlink resolution for those files that are symlinks.
  • [1] selects the first. You could also omit it and refer to the oldest as $oldest[1].
  • In zsh (contrary to bash), [0-9] is equivalent to [0123456789], bash glob's [0-9] usually matches hundreds of other characters besides 0123456789; for regexp [0-9], the behaviour varies between regexp implementation.

Since version 5.3, you can also affect globbing expansion order in bash via its GLOBSORT special variable, so you could do something similar there with

shopt -s nullglob
GLOBSORT=mtime
base=filename
set -- "$base"_[0123456789][0123456789][0123456789][0123456789][0123456789][0123456789][0123456789][0123456789].csv
oldest=( ${1+"$1"} )

Note:

  • that's the reverse from zsh's globs or ls -t in that mtime sorts from oldest to newest (-mtime the reverse for newest to oldest) while ls -t or zsh's om sorts from newest to oldest (by increasing age).
  • the comparison is done after symlink resolution (like when the - qualifier is used in zsh or -L is used in ls), and there's no way to change it. Likely not a problem in your case.

Now as to the more general question of expansing globs using regexps, that's not supported by bash, but ksh93 (the shell bash tries to emulate) and zsh can do it.

In ksh93:

files=( ~(NE)^filename_[0123456789]{8}$ )

Would do Extended regex matching (alternatives as G for basic regexp, X for AT&T augmented regexps, P for perl-like regexps, V for SysV regexp, whatever that means).

In zsh:

files=( *(Ne['[[ $REPLY =~ "^filename_[0123456789]{8}\.csv$" ]]']) )

That's an extended regexp match by default but can be changed to PCRE with set -o rematchpcre, but then you'd want to replace $ with \z and could replace [0123456789] with [0-9] or \d.

You could also define a re helper function which takes the pattern from a $RE variable like:

RE='^filename_[0123456789]{8}\.csv$'
re() [[ ${1-$REPLY} =~ $RE ]]
files=( *(N+re) )
0

The only reason to use Bash is Perl is not installed!

Here is a solution in Perl if you are interested. The steps I took were as follows.

  1. Exit if filename.csv exists
  2. Find all files in directory, oldest first, using the command ls -rt
  3. Save this list in @fileList
  4. Check all files in @fileList for pattern _(\d{4})(\d{2})(\d{2})${extension}\n, I used \d instead of [0-9], and $extension is .csv
  5. List all files matching pattern
  6. List oldest file and date
  7. Move oldest file to filename.csv

Here is the code...

#!/usr/bin/perl -w

#see if filename.csv exists
my $baseName = "filename";
my $extension = ".csv";
my $fileName = $baseName . $extension;
my (@fileList, @fileListMatchingPattern, @datesOfFiles);

#file already exists, dont do anything
exit(0) if(-e $fileName);

#find files matching filename_*.csv, oldest first, file names contain a newline
#ls -t is sort by time newest first, ls -rt is oldest first
@fileList = `ls -rt ${baseName}_*$extension`;

#find files matching pattern
for(@fileList){
  if( /_(\d{4})(\d{2})(\d{2})${extension}\n/ ){
    #you can also just save the first match found, which will be the oldest
    #this will save a list of all files matching pattern.
    push(@fileListMatchingPattern,$_);

    #backreference dates and save in a second array, linked by index
    push(@datesOfFiles,"Month: $2 Day: $3 Year: $1");
  }
}

#oldest file matching pattern will be the first in array
if(@fileListMatchingPattern){
  print "Listing files matching pattern, oldest first:\n";
  print for(@fileListMatchingPattern);
  print "\nOldest file matching pattern is: $fileListMatchingPattern[0]";
  print "date of file is $datesOfFiles[0]\n";

  #move oldest file to $fileName
  chomp($fileListMatchingPattern[0]);     #remove newline
  my $cmd = "mv $fileListMatchingPattern[0] $fileName";
  print "Running command: \"$cmd\"\n";
  `$cmd`;
}else{
  print "No files found matching pattern\n";
}

I created a directory with some sample files to test. Output looks like this...

$ perl archive.files.pl

Listing files matching pattern, oldest first:
filename_20000101.csv
filename_20010101.csv
filename_20020101.csv
filename_20030101.csv
filename_20040101.csv
filename_20050101.csv
filename_20060101.csv
filename_20070101.csv
filename_20080101.csv
filename_20090101.csv
filename_20100101.csv
filename_20110101.csv
filename_20120101.csv
filename_20130101.csv
filename_20140101.csv
filename_20150101.csv
filename_20160101.csv
filename_20170101.csv
filename_20180101.csv
filename_20190101.csv
filename_20200101.csv
filename_20210101.csv
filename_20220101.csv
filename_20230101.csv
filename_20240101.csv
filename_20250101.csv
filename_20260101.csv
filename_20270101.csv
filename_20280101.csv
filename_20290101.csv

Oldest file matching pattern is: filename_20000101.csv
date of file is Month: 01 Day: 01 Year: 2000
Running command: "mv filename_20000101.csv filename.csv"

I backreferenced all the dates and saved them in array @datesOfFiles in case you need to know the dates of the archived files. That should be exactly what you are looking for. You could make this script a cron job and have it run every so often automatically.

Good Luck!

0

To answer your question:

How can I modify the below code to look for the oldest file matching a pattern? If I just look for basename, I pick up a malformed filename, like filename_YYMM, which I don't want.

I don't think bash's for statement can use a regex natively, just glob patterns, however...

As a minimum you can use glob patterns to ensure that it has the correct number of characters:

unset -v oldest
for f in ${mybasename}_????????.csv;
do
   [[ -z $oldest || $f -ot $oldest ]] && oldest=$f
done

Any maybe that is sufficient?

As long as the glob pattern is a superset of the regex, you will find all matching file by combining with your previous approach of testing if the $file matches a regex, prior to the rest of your test:

unset -v oldest
regex="${mybasename}_[0-9]{4}[0-9]{2}[0-9]{2}(.csv)$"
for f in ${mybasename}_????????.csv; do
   [[ $f =~ $regex ]] && [[ -z $oldest || $f -ot $oldest ]] && oldest=$f
done

Or have I missed something?

NB: the $mybasename_* would probably not be what you want because the variable named file_ would be expanded, not the variable mybasename to be expanded and _ appended and then the * glob pattern expansion. To avoid that the braces can be used to delimit the variable name, as in ${mybasename}_* which may have actually been your stumbling block?

NB: using $f is a little dangerous too as you may at some point have a filename with a space in it, so you may want to use "$f". The same is NOT true of the $regex expansion - putting that in quotes will break the regex, so use carefully.

Good luck!

1
  • Doh! I don't know how this got bumped to the top of the unanswered list, but I just spotted the original posting date. Wouldn't have wasted time on it if I'd noticed - sorry to bump it again. Commented Nov 1 at 20:13

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.