How can I find all of the distinct file extensions in a folder hierarchy?

Question

On a Linux machine I would like to traverse a folder hierarchy and get a list of all of the distinct file extensions within it.

What would be the best way to achieve this from a shell?

Benjamin Loison · Accepted Answer · 2023-05-22 13:51:24Z

507

Try this (not sure if it's the best way, but it works):

find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u

It work as following:

Find all files from current folder
Prints extension of files if any
Make a unique sorted list

edited May 22, 2023 at 13:51

Benjamin Loison

5,7514 gold badges20 silver badges37 bronze badges

answered Dec 3, 2009 at 19:21

Ivan Nevostruev

28.9k8 gold badges69 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Dennis Golomazov Over a year ago

just for reference: if you want to exclude some directories from searching (e.g. .svn), use find . -type f -path '*/.svn*' -prune -o -print | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u source

Ivan Nevostruev Over a year ago

Spaces will not make any difference. Each file name will be in separate line, so file list delimiter will be "\n" not space.

Ryan Shillington Over a year ago

On Windows, this works better and is much faster than find: dir /s /b | perl -ne 'print $1 if m/\.([^^.\\\\]+)$/' | sort -u

jakub.g Over a year ago

git variation of the answer: use git ls-tree -r HEAD --name-only instead of find

marcovtwout Over a year ago

A variation, this shows the list with counts per extension: find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort | uniq -c | sort -n

|

Benjamin Loison · Accepted Answer · 2023-05-22 13:52:00Z

98

No need for the pipe to sort, awk can do it all:

find . -type f | awk -F. '!a[$NF]++{print $NF}'

edited May 22, 2023 at 13:52

Benjamin Loison

5,7514 gold badges20 silver badges37 bronze badges

answered Aug 24, 2011 at 5:21

SiegeX

141k25 gold badges150 silver badges159 bronze badges

5 Comments

user2602152 Over a year ago

I am not getting this to work as an alias, I am getting awk: syntax error at source line 1 context is >>> !a[] <<< awk: bailing out at source line 1. What am I doing wrong? My alias is defined like this: alias file_ext="find . -type f -name '.' | awk -F. '!a[$NF]++{print $NF}'"

SiegeX Over a year ago

@user2602152 the problem is that you are trying to surround the entire one-liner with quotes for the alias command but the command itself already uses quotes in the find command. To fix this I would use bash's literal string syntax as so: alias file_ext=$'find . -type f -name "*.*" | awk -F. \'!a[$NF]++{print $NF}\''

Nelson Teixeira Over a year ago

this doesn't work if one subdir has a . in it's name and the file doesn't have file extension. Example: when we run from maindir it will fail for maindir/test.dir/myfile

SiegeX Over a year ago

@NelsonTeixeira Add -printf "%f\n" to the end of the 'find' command and re-run your test.

Big Joe Over a year ago

I found what I was looking for. Your command help me list the file types but I wanted a number next to the type. Googled and found this find . -type f | sed -n 's/..*\.//p' | sort | uniq -c Thanks for the help

Ondra Žižka · Accepted Answer · 2023-08-30 13:09:22Z

89

My awk-less, sed-less, Perl-less, Python-less POSIX-compliant alternative:

find . -name '*.?*' -type f | rev | cut -d. -f1 | rev  | tr '[:upper:]' '[:lower:]' | sort | uniq --count | sort -rn

The trick is that it reverses the line and cuts the extension at the beginning.
It also converts the extensions to lower case.

Example output:

   3689 jpg
   1036 png
    610 mp4
     90 webm
     90 mkv
     57 mov
     12 avi
     10 txt
      3 zip
      2 ogv
      1 xcf
      1 trashinfo
      1 sh
      1 m4v
      1 jpeg
      1 ini
      1 gqv
      1 gcs
      1 dv

edited Aug 30, 2023 at 13:09

answered Mar 23, 2019 at 18:37

Ondra Žižka

47.2k49 gold badges228 silver badges298 bronze badges

8 Comments

worc Over a year ago

on mac, uniq doesn't have the full flag --count, but -c works just fine

Chris Hayes Over a year ago

Very cool, would be nice if this didn't include files that don't have extensions though. Running this at the base of a repo produces a crap load of git files that are extensionless.

Ondra Žižka Over a year ago

@ChrisHayes, easy help: find . -type f -name '*.?* .... ', not fully tested but should work.

emersion Over a year ago

To only take into account files that are checked in Git, replace the find command with: git ls-files '*.?*'

Mike 'Pomax' Kamermans Jan 7 at 20:26

Note that this will also turn files like ./LICENSE or ./CHANGELOG into /license and /changelog, because they start with a dot, so all it does is remove the dot, which isn't great.

|

David Mohundro · Accepted Answer · 2016-11-01 02:08:18Z

63

Recursive version:

find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort -u

If you want totals (how may times the extension was seen):

find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort | uniq -c | sort -rn

Non-recursive (single folder):

for f in *.*; do printf "%s\n" "${f##*.}"; done | sort -u

I've based this upon this forum post, credit should go there.

edited Nov 1, 2016 at 2:08

David Mohundro

12.6k5 gold badges43 silver badges46 bronze badges

answered Dec 3, 2009 at 19:38

ChristopheD

117k30 gold badges167 silver badges182 bronze badges

1 Comment

vulcan raven Over a year ago

Great! also works for my git scenario, was trying to figure out which type of files I have touched in the last commit: git show --name-only --pretty="" | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort -u

David Mohundro · Accepted Answer · 2016-11-01 02:06:57Z

42

Powershell:

dir -recurse | select-object extension -unique

Thanks to http://kevin-berridge.blogspot.com/2007/11/windows-powershell.html

edited Nov 1, 2016 at 2:06

David Mohundro

12.6k5 gold badges43 silver badges46 bronze badges

answered Apr 23, 2010 at 14:18

Simon R

4694 silver badges2 bronze badges

5 Comments

Forbesmyester Over a year ago

The OP said "On a Linux machine"

KIC Over a year ago

actually there is prowershell for linux out now: github.com/Microsoft/PowerShell-DSC-for-Linux

mcw Over a year ago

As written, this will also pick up directories that have a . in them (e.g. jquery-1.3.4 will show up as .4 in the output). Change to dir -file -recurse | select-object extension -unique to get only file extensions.

Roel Over a year ago

@Forbesmyester: People with Windows (like me) will find this question to. So this is usefull.

Mahesh Over a year ago

Thanks for Powershell answer. You don't assume how users search. Lot of people upvoted for a reason

gkb0986 · Accepted Answer · 2021-03-18 21:10:27Z

20

Adding my own variation to the mix. I think it's the simplest of the lot and can be useful when efficiency is not a big concern.

find . -type f | grep -oE '\.(\w+)$' | sort -u

edited Mar 18, 2021 at 21:10

answered Jul 15, 2013 at 5:59

gkb0986

3,2492 gold badges26 silver badges23 bronze badges

5 Comments

mMontu Over a year ago

+1 for portability, although the regex is quite limited, as it only matches extensions consisting of a single letter. Using the regex from the accepted answer seems better: $ find . -type f | grep -o -E '\.[^.\/]+$' | sort -u

gkb0986 Over a year ago

Agreed. I slacked off a bit there. Editing my answer to fix the mistake you spotted.

msangel Over a year ago

cool. I chenge quotes to doublequotes, update grep biraries and dependencies(because provided with git is outdated) and now this work under windows. feel like linux user.

Fernando Crespo Over a year ago

I like this approach. Just would change the regex a bit $ find . -type f | grep -Eo '\.(\w+)$' | sort -u. The original one shows files without extension in my case that was not what I needed.

wuseman Over a year ago

Nr1, thanks alot for this minimal and elegant example

user224243 · Accepted Answer · 2009-12-03 21:47:59Z

13

Find everythin with a dot and show only the suffix.

find . -type f -name "*.*" | awk -F. '{print $NF}' | sort -u

if you know all suffix have 3 characters then

find . -type f -name "*.???" | awk -F. '{print $NF}' | sort -u

or with sed shows all suffixes with one to four characters. Change {1,4} to the range of characters you are expecting in the suffix.

find . -type f | sed -n 's/.*\.\(.\{1,4\}\)$/\1/p'| sort -u

answered Dec 3, 2009 at 21:47

user224243

4092 silver badges5 bronze badges

5 Comments

SiegeX Over a year ago

No need for the pipe to 'sort', awk can do it all: find . -type f -name "." | awk -F. '!a[$NF]++{print $NF}'

Ralf Over a year ago

@SiegeX Yours should be a separate answer. It found that command to work the best for large folders, as it prints the extensions as it finds them. But note that it should be: -name "."

SiegeX Over a year ago

@Ralf done, posted answer here. Not quite sure about what you mean by the -name "." thing because that's what it already is

Ralf Over a year ago

I meant it should be -name "*.*", but StackOverflow removes the * characters, which probably happened in your comment as well.

jrz Over a year ago

It seems like this should be the accepted answer, awk is preferable to perl as a command-line tool and it embraces the unix philosophy of piping small interoperable programs into cohesive and readable procedures.

HoldOffHunger · Accepted Answer · 2020-10-23 16:52:21Z

10

I tried a bunch of the answers here, even the "best" answer. They all came up short of what I specifically was after. So besides the past 12 hours of sitting in regex code for multiple programs and reading and testing these answers this is what I came up with which works EXACTLY like I want.

 find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{2,16}" | awk '{print tolower($0)}' | sort -u

Finds all files which may have an extension.
Greps only the extension
Greps for file extensions between 2 and 16 characters (just adjust the numbers if they don't fit your need). This helps avoid cache files and system files (system file bit is to search jail).
Awk to print the extensions in lower case.
Sort and bring in only unique values. Originally I had attempted to try the awk answer but it would double print items that varied in case sensitivity.

If you need a count of the file extensions then use the below code

find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{2,16}" | awk '{print tolower($0)}' | sort | uniq -c | sort -rn

While these methods will take some time to complete and probably aren't the best ways to go about the problem, they work.

Update: Per @alpha_989 long file extensions will cause an issue. That's due to the original regex "[[:alpha:]]{3,6}". I have updated the answer to include the regex "[[:alpha:]]{2,16}". However anyone using this code should be aware that those numbers are the min and max of how long the extension is allowed for the final output. Anything outside that range will be split into multiple lines in the output.

Note: Original post did read "- Greps for file extensions between 3 and 6 characters (just adjust the numbers if they don't fit your need). This helps avoid cache files and system files (system file bit is to search jail)."

Idea: Could be used to find file extensions over a specific length via:

 find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{4,}" | awk '{print tolower($0)}' | sort -u

Where 4 is the file extensions length to include and then find also any extensions beyond that length.

edited Oct 23, 2020 at 16:52

HoldOffHunger

21.3k11 gold badges123 silver badges146 bronze badges

answered May 26, 2014 at 18:45

Shinrai

3693 silver badges8 bronze badges

5 Comments

Fernando Montoya Over a year ago

Is the count version recursive?

alpha_989 Over a year ago

@Shinrai, In general works well. but if you have some random file extensions which are really long such as .download, it will break the ".download" into 2 parts and report 2 files one which is "downlo" and another which is "ad"

Shinrai Over a year ago

@alpha_989, That's due to the regex "[[:alpha:]]{3,6}" will also cause an issue with extensions smaller than 3 characters. Adjust to what you need. Personally I'd say 2,16 should work in most cases.

alpha_989 Over a year ago

Thanks for replying.. Yeah.. thats what I realized later on. It worked well after I modified it similar to what you mentioned.

anjanesh Over a year ago

find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{2,16}" | awk '{print tolower($0)}' | sort | uniq -c | sort -rn

- this works well - but is there a way to get the total file size of each php extension ?

Alvin · Accepted Answer · 2015-12-16 05:34:54Z

5

In Python using generators for very large directories, including blank extensions, and getting the number of times each extension shows up:

import json
import collections
import itertools
import os

root = '/home/andres'
files = itertools.chain.from_iterable((
    files for _,_,files in os.walk(root)
    ))
counter = collections.Counter(
    (os.path.splitext(file_)[1] for file_ in files)
)
print json.dumps(counter, indent=2)

edited Dec 16, 2015 at 5:34

Alvin

2,55533 silver badges47 bronze badges

answered Aug 24, 2012 at 19:17

Andres Restrepo

3985 silver badges6 bronze badges

Comments

ChristopheD · Accepted Answer · 2009-12-04 08:27:53Z

4

Since there's already another solution which uses Perl:

If you have Python installed you could also do (from the shell):

python -c "import os;e=set();[[e.add(os.path.splitext(f)[-1]) for f in fn]for _,_,fn in os.walk('/home')];print '\n'.join(e)"

answered Dec 4, 2009 at 8:27

ChristopheD

117k30 gold badges167 silver badges182 bronze badges

Comments

Rajib · Accepted Answer · 2021-05-31 15:18:57Z

4

Another way:

find . -type f -name "*.*" -printf "%f\n" | while IFS= read -r; do echo "${REPLY##*.}"; done | sort -u

You can drop the -name "*.*" but this ensures we are dealing only with files that do have an extension of some sort.

The -printf is find's print, not bash. -printf "%f\n" prints only the filename, stripping the path (and adds a newline).

Then we use string substitution to remove up to the last dot using ${REPLY##*.}.

Note that $REPLY is simply read's inbuilt variable. We could just as use our own in the form: while IFS= read -r file, and here $file would be the variable.

answered May 31, 2021 at 15:18

Rajib

5437 silver badges11 bronze badges

Comments

Robert · Accepted Answer · 2018-02-13 08:21:45Z

3

I think the most simple & straightforward way is

for f in *.*; do echo "${f##*.}"; done | sort -u

It's modified on ChristopheD's 3rd way.

answered Feb 13, 2018 at 8:21

Robert

2,2971 gold badge28 silver badges24 bronze badges

Comments

user25148 · Accepted Answer · 2009-12-04 08:35:28Z

2

None of the replies so far deal with filenames with newlines properly (except for ChristopheD's, which just came in as I was typing this). The following is not a shell one-liner, but works, and is reasonably fast.

import os, sys

def names(roots):
    for root in roots:
        for a, b, basenames in os.walk(root):
            for basename in basenames:
                yield basename

sufs = set(os.path.splitext(x)[1] for x in names(sys.argv[1:]))
for suf in sufs:
    if suf:
        print suf

answered Dec 4, 2009 at 8:35

user25148

Comments

dMb · Accepted Answer · 2018-05-21 23:01:17Z

2

I don't think this one was mentioned yet:

find . -type f -exec sh -c 'echo "${0##*.}"' {} \; | sort | uniq -c

answered May 21, 2018 at 23:01

dMb

9,3973 gold badges50 silver badges66 bronze badges

1 Comment

Ondra Žižka Over a year ago

This would probably be quite slow due to spawning a new process for each file.

Chris Medina · Accepted Answer · 2020-04-04 13:02:38Z

2

The accepted answer uses REGEX and you cannot create an alias command with REGEX, you have to put it into a shell script, I'm using Amazon Linux 2 and did the following:

I put the accepted answer code into a file using :

sudo vim find.sh

add this code:

find ./ -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u

save the file by typing: :wq!

sudo vim ~/.bash_profile
alias getext=". /path/to/your/find.sh"
:wq!
. ~/.bash_profile

answered Apr 4, 2020 at 13:02

Chris Medina

3381 silver badge10 bronze badges

Comments

Nisharg Shah · Accepted Answer · 2023-07-16 21:20:46Z

1

If you are looking for answer that respect .gitignore then check below answer.

git ls-tree -r HEAD --name-only | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u

answered Jul 16, 2023 at 21:20

Nisharg Shah

20k12 gold badges70 silver badges78 bronze badges

Comments

jansalleine · Accepted Answer · 2024-03-27 09:54:13Z

1

Another version of Ondra Žižka's one:

find . -name '*.?*' -type f | rev | cut -d. -f1 | rev | sort | uniq

On case sensitive file systems different cases should imho not be treated as the same extension. Also I don't think counting files is necessary as an answer to OPs question.

answered Mar 27, 2024 at 9:54

jansalleine

112 bronze badges

Comments

jrock2004 · Accepted Answer · 2013-03-25 17:27:04Z

0

you could also do this

find . -type f -name "*.php" -exec PATHTOAPP {} +

edited Mar 25, 2013 at 17:27

answered Mar 25, 2013 at 16:12

jrock2004

3,6017 gold badges49 silver badges87 bronze badges

Comments

Diego Callejo · Accepted Answer · 2020-02-20 14:28:44Z

0

I've found it simple and fast...

   # find . -type f -exec basename {} \; | awk -F"." '{print $NF}' > /tmp/outfile.txt
   # cat /tmp/outfile.txt | sort | uniq -c| sort -n > tmp/outfile_sorted.txt

answered Feb 20, 2020 at 14:28

Diego Callejo

1

Collectives™ on Stack Overflow

How can I find all of the distinct file extensions in a folder hierarchy?

19 Answers 19

11 Comments

5 Comments

8 Comments

1 Comment

5 Comments

5 Comments

5 Comments

5 Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

19 Answers 19

11 Comments

5 Comments

8 Comments

1 Comment

5 Comments

5 Comments

5 Comments

5 Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related