51

The following regex

var patt1=/[0-9a-z]+$/i;

extracts the file extension of strings such as

filename-jpg
filename#gif
filename.png

How to modify this regular expression to only return an extension when string really is a filename with one dot as separator ? (Obviously filename#gif is not a regular filename)

UPDATE Based on tvanofsson's comments I would like to clarify that when the JS function receives the string, the string will already contain a filename without spaces without the dots and other special characters (it will actually be handled a slug). The problem was not in parsing filenames but in incorrectly parsing slugs - the function was returning an extension of "jpg" when it was given "filename-jpg" when it should really return null or empty string and it is this behaviour that needed to be corrected.

3
  • 4
    Does the regex have to determine if the filename is a legal filename? What defines a legal filename? What defines a legal filename extension? For example, is foo bar.zi_ a legal filename? How about foo.bar.zi_? Commented Jul 5, 2011 at 12:06
  • The typical OS filename..your example with space in it cannot happen in our system and the answer provided by @stema seems to work with double extensions so it's good enough for me. Commented Jul 5, 2011 at 12:11
  • 2
    Both examples are legal file names in Unix and Windows. Your question could be improved by detailing exactly what you consider to be a legal filename. It will make the answers, esp. the accepted answer more meaningful to future readers who may be looking to solve the same or a similar problem. Commented Jul 5, 2011 at 12:39

6 Answers 6

99

Just add a . to the regex

var patt1=/\.[0-9a-z]+$/i;

Because the dot is a special character in regex you need to escape it to match it literally: \..

Your pattern will now match any string that ends with a dot followed by at least one character from [0-9a-z].

Example:

[
  "foobar.a",
  "foobar.txt",
  "foobar.foobar1234"
].forEach( t => 
  console.log(
    t.match(/\.[0-9a-z]+$/i)[0]
  ) 
)


if you want to limit the extension to a certain amount of characters also, than you need to replace the +

var patt1=/\.[0-9a-z]{1,5}$/i;

would allow at least 1 and at most 5 characters after the dot.

Sign up to request clarification or add additional context in comments.

5 Comments

what if I don't need the dot in my match and just the extension only?
@user2727195 Without the dot, you're not matching an extension. If you mean... how do you use only the resulting text, then you could use substring, like so: ( ("file.ext").match(patt1) || '').substring(1);
also fails for .tar.gz extension
The code in the comment above didn't work for me. If you want just the extension, not the dot, do... const ext = ("file.ext".match(/\.[0-9a-z]{1,5}$/i) || [""])[0].substring(1)
in bad case this solution will result in NULL
52

Try

var patt1 = /\.([0-9a-z]+)(?:[\?#]|$)/i;

This RegExp is useful for extracting file extensions from URLs - even ones that have ?foo=1 query strings and #hash endings.

It will also provide you with the extension as $1.

var m1 = ("filename-jpg").match(patt1);
alert(m1);  // null

var m2 = ("filename#gif").match(patt1);
alert(m2);  // null

var m3 = ("filename.png").match(patt1);
alert(m3);  // [".png", "png"]

var m4 = ("filename.txt?foo=1").match(patt1);
alert(m4);  // [".txt?", "txt"]

var m5 = ("filename.html#hash").match(patt1);
alert(m5);  // [".html#", "html"]

P.S. +1 for @stema who offers pretty good advice on some of the RegExp syntax basics involved.

Comments

17

Example list:

var fileExtensionPattern = /\.([0-9a-z]+)(?=[?#])|(\.)(?:[\w]+)$/gmi
//regex flags -- Global, Multiline, Insensitive

var ma1 = 'css/global.css?v=1.2'.match(fileExtensionPattern)[0];
console.log(ma1);
// returns .css

var ma2 = 'index.html?a=param'.match(fileExtensionPattern)[0];
console.log(ma2);
// returns .html

var ma3 = 'default.aspx?'.match(fileExtensionPattern)[0];
console.log(ma3);
// returns .aspx

var ma4 = 'pages.jsp#firstTab'.match(fileExtensionPattern)[0];
console.log(ma4);
// returns .jsp

var ma5 = 'jquery.min.js'.match(fileExtensionPattern)[0];
console.log(ma5);
// returns .js

var ma6 = 'file.123'.match(fileExtensionPattern)[0];
console.log(ma6);
// returns .123

Test page.

Comments

7

ONELINER:

let ext = (filename.match(/\.([^.]*?)(?=\?|#|$)/) || [])[1] 

above solution include links. It takes everything between last dot and first "?" or "#" char or string end. To ignore "?" and "#" characters use /\.([^.]*)$/. To ignore only "#" use /\.([^.]*?)(?=\?|$)/. Examples

function getExtension(filename) {
  return (filename.match(/\.([^.]*?)(?=\?|#|$)/) || [])[1];
}


// TEST
[
  "abcd.Ef1",
  "abcd.efg",
  "abcd.efg?aaa&a?a=b#cb",
  "abcd.efg#aaa__aa?bb",
  "abcd",
  "abcdefg?aaa&aa=bb",
  "abcdefg#aaa__bb",
].forEach(t=> console.log(`${t.padEnd(21,' ')} -> ${getExtension(t)}`))

Comments

3

I found this solution on the O'Reilly Regular Expressions Cookbook (chapter 8, section 24). It is case-insensitive and works with .NET, Java, JavaScript, PCRE, Perl, Python & Ruby.

\.[^.\\/:*?"<>|\r\n]+$

A file extension must begin with a dot. Thus, we add ‹.› to match a literal dot at the start of the regex.

Filenames such as Version 2.0.txt may contain multiple dots. The last dot is the one that delimits the extension from the filename. The extension itself should not contain any dots. We specify this in the regex by putting a dot inside the character class. The dot is simply a literal character inside character classes, so we don’t need to escape it. The ‹$› anchor at the end of the regex makes sure we match .txt instead of .0.

If the string ends with a backslash, or with a filename that doesn’t include any dots, the regex won’t match at all. When it does match, it will match the extension, including the dot that delimits the extension and ...

Comments

0

I will advise use this function since it avoids returning null

const getExtension = (filename?: string): string | undefined => {
  if (!filename) return undefined
  const match = /\.([^.]+)$/.exec(filename)
  return match ? match[1] : undefined
}

This function takes an optional filename parameter, which can be undefined. If the filename is undefined, the function returns undefined. Otherwise, the function uses a regular expression to extract the file extension from the filename. If the regular expression matches, the function returns the extracted file extension; otherwise, it returns undefined.

const getExtension = (filename) => {
  if (!filename) return undefined
  const match = /\.([^.]+)$/.exec(filename)
  return match ? match[1] : undefined
}

[
  "a.abc.x.ico",
  "foobar.a",
  "foobar.txt",
  "foobar.foobar1234",
  "undegined",
  undefined, null
].forEach(t =>
  console.log(
    getExtension(t)
  )
)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.