1

I have some code that writes out files with names like this:

body00123.txt
body00124.txt
body00125.txt

body-1-2126.txt
body-1-2127.txt
body-1-2128.txt

body-3-3129.txt
body-3-3130.txt
body-3-3131.txt

Such that the first two numbers in the file can be 'negative', but the last 3 numbers are not.

I have a list such as this:

123
127
129

And I want to remove all the files that don't end with one of these numbers. An example of the desired leftover files would be like this:

body00123.txt

body-1-2127.txt

body-3-3129.txt

My code is running in python, so I have tried:

for i not in myList:
     os.system('rm body*' + str(i) + '.txt')

And this resulted in every file being deleted.

3
  • Should a file called body1123.txt or body-2-123 or bodyandsomethingelse00123.txt exist? And if yes, should it be deleted? Commented Nov 25, 2022 at 20:21
  • Are there any other files in the directory that should be kept? Or does it only consist of body... files? Commented Nov 25, 2022 at 20:26
  • Your line for i not in myList: returns a syntax error in both python2 and python3. Is this the actual code you're running? Commented Nov 29, 2022 at 19:17

4 Answers 4

2

Sometimes it's easier to move the "good" files out of the way, and then delete the bad files, and then move the good files back.

If that approach is suitable then this might work

#!/bin/sh

# Temporary directory to hold the files we want to keep
mkdir .keep || exit

for a in $(cat keeplist)
do
  # These are the files we want to keep
  mv body*$a.txt .keep

  # Except this might match negative versions, so remove them
  rm -f .keep/*-$a.txt
done

# Remove the files we don't want
rm body*

# Move the good files back
mv .keep/* .

# Tidy up
rmdir .keep

So, for example, if we start with:

% ls
body-1-2126.txt  body-2-3-123.txt  body-3-3131.txt  body00125.txt  s
body-1-2127.txt  body-3-3129.txt   body00123.txt    fix
body-1-2128.txt  body-3-3130.txt   body00124.txt    keeplist

And then run that script, we end up with

% ls
body-1-2127.txt  body-3-3129.txt  body00123.txt  fix  keeplist  s
1

In zsh:

$ set -o extendedglob
$ list=(123 127 129)
$ echo rm body(^*(${(~j[|])list})).txt
rm body00124.txt body00125.txt body-1-2126.txt body-1-2128.txt body-3-3130.txt body-3-3131.txt

(remove the echo to actually do it).

The j[|] parameter expansion flag joins the elements of $list with |. With the ~ flag, those are interpreted as a glob operator (the alternation operator as opposed to just a literal |).

So the glob ends up being body(^*(123|127|129)).txt, ^ being the negation extendedglob operator, so matching on filenames that start with body, followed by any string not ending in 123, 127, 129, followed by .txt.

Replace * with (^*-) if you need the extra condition that the part before those numbers must not end with - if they're to be preserved so a file called body-1-1-123.txt for instance would also be removed.

For an even stricter matching, you could even do:

n='((-|)[0-9])' # digit with an optional - sign
echo rm body$~n(#c2)($~n(#c3)~(${(~j[|])list})).txt

Where (#c2) is the repetition operator, and ~ is a except (and-not) operator. $~n is like $n except the contents of $n is interpreted as a pattern as opposed to a literal string (like for the ~ parameter expansion flag above).

So we're matching on body followed by two digits each optionally preceded by a - followed by 3 of those except those that are either one of the members of $list, followed by .txt.

5
  • I can't quite work out of you've caught -123 as being different to the 123 pattern? Commented Nov 25, 2022 at 20:14
  • @roaima, sorry I don't understand what you mean. Commented Nov 25, 2022 at 20:17
  • I think the OP has five digits each of which can be positive or negative. They want to keep (for example) x.y.1.2.3 represented as xy123. It's not the same as x.y.-1.2.3 represented as the string xy-123 even though it ends with the same three characters 123. Is that any better? Commented Nov 25, 2022 at 20:22
  • @roaima, ah, I'm with you now. They say the last 3 numbers cannot be negative though. Commented Nov 25, 2022 at 20:24
  • Ah yes ok. Missed that Commented Nov 25, 2022 at 20:27
1

find has a name-matching primitive which can be negated to allow taking actions on files that do not match a name, or which don't match any of a list of names.

Since find's default is to and together multiple operations given on one line, we can write a bash script as follows:

#!/usr/bin/env bash

list=( 123 127 129 )

findcmd="find . -type f $(printf -- ' -not -name \*%s.txt' "${list[@]}")"

bash -v <<< "$findcmd"

(Note: the bash line could also be done as:

printf '%s\n' "$findcmd"
eval $findcmd

)

The output from that script is:

find . -type f  -not -name \*123.txt -not -name \*127.txt -not -name \*129.txt
./body-3-3130.txt
./body00125.txt
./body-1-2126.txt
./body00124.txt
./body-1-2128.txt
./body-3-3131.txt

Here we see two pieces of information: the find command syntax that was built from the array of numbers to keep; and the resulting list of files that do not match any of those numbers.

Inspect the list of filenames closely. Once you have confirmed that you want to delete all of those files, copy the find command syntax and paste it and append the find action directive -exec rm -v {} \; as follows (shown with a backslash-escaped line break for readability):

$ find . -type f  -not -name \*123.txt -not -name \*127.txt -not -name \*129.txt \
    -exec rm -v {} \;
./body-3-3130.txt
./body00125.txt
./body-1-2126.txt
./body00124.txt
./body-1-2128.txt
./body-3-3131.txt
4
  • yes, exactly. (I had to take a double-take there.) Commented Nov 29, 2022 at 19:34
  • (Though with eval there's the mandatory comment about taking care not to use it with user-supplied or untrusted inputs etc. Should work in this case, though.) Commented Nov 29, 2022 at 19:35
  • @ilkkachu Thanks! Updated :) Commented Nov 29, 2022 at 19:37
  • I think you could also generate a find expression like ! \( -name this -o -name that \) to match the ones to be deleted, and then -exec rm {} +, or -delete Commented Nov 29, 2022 at 20:02
0

Python. Straightforward way

import os
import glob

num_lst = [123, 127, 129]
num_as_str_set = set(map(str, num_lst))

# If not other files except .txt in directory, listdir() will be enough
#for filename in os.listdir():
for filename in glob.glob("*.txt"):
    #7654321
    #123.txt
    #[-7:-4] -> 123
    if filename[-7:-4] not in num_as_str_set:
        print("remove", filename)
# Uncomment to remove files
#       os.remove(filename)

The same logic on Bash

declare -A hash_map
hash_map=( [123]= [127]= [129]= )

for fn in *.txt; do
    key="${fn: -7:-4}"
    if ! [[ -v hash_map["$key"] ]]; then
        echo "$fn"
#Uncomment to actual remove
#       rm -v "$fn"
    fi  
done

Python. Tricky, may be unoptimal way (benchmark is needed)

import os
from glob import glob
from itertools import chain

num_lst = [123, 127, 129]
s = set(glob("*.txt")) - set(chain(*(glob(f"*{num}.txt") for num in num_lst)))
#Uncomment to remove files
#list(map(os.remove, s))

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.