3

I stumbled upon different behavior of std::filesystem::path::parent_path() when a path contains leading slashes. Clang says that the parent of ///a is /// (which is what I would expect), but GCC says that the parent is / (that is, it strips the extra slashes). At the same time, GCC does not strip the slashes from the path /// (i.e. the parent of /// is ///).

Is this an acceptable behavior, or a GCC bug?

The code:

#include <filesystem>
#include <iostream>

namespace fs = std::filesystem;
 
int main()
{
    for (fs::path p : {"///", "///a"})
        std::cout
            << p
            << "\nparent: " << p.parent_path()
            << "\nbegin: " << *p.begin()
            << "\nroot_dir: " << p.root_directory()
            << "\n\n";
}

GCC 16 output:

"///"
parent: "///"
begin: "///"
root_dir: "/"

"///a"
parent: "/"  <- ???
begin: "/"
root_dir: "/"

Clang 22 output:

"///"
parent: "///"
begin: "/"
root_dir: "/"

"///a"
parent: "///" <- OK
begin: "/"
root_dir: "/"

I tried reading the draft of the standard, where parent_path() is documented to return a path whose generic format pathname is the longest prefix of the generic format pathname of *this that produces one fewer element in its iteration., but the iteration confuses me even more. The thing is, the path iterators are documented to return the root name first (does not exist in the examples above), and then the root directory. As you can see in the above, root_directory() is / everywhere and is equal to begin() in all cases except for the /// path in GCC, where begin() is ///. In this case, GCC seems to agree with the standard (as I understand it) as the first element of the iteration is always equal to the root directory, while Clang now strips the extra slashes from the first element.

2
  • 4
    POSIX specifications (which underpin all open specification unixes, including linux) say that multiple consecutive slashes in a path are considered the same as one slash i.e. /// is equivalent to /, so ///usr and /usr are equivalent (e.g. refer to the same directory). From that perspective, both gcc and clang's approach are "acceptable" - they still resolve as the same path. Commented Nov 16 at 23:59
  • 1
    @Peter POSIX (Pathname Resolution section) says that "If a pathname begins with two successive <slash> characters, the first component following the leading <slash> characters may be interpreted in an implementation-defined manner, although more than two leading <slash> characters shall be treated as a single <slash> character." Commented Nov 17 at 6:41

1 Answer 1

3

Interpretation of multiple leading slashes in a path is implementation-defined.

A path begins with an optional root-name element, followed by an optional root-directory separator and relative-path. The root-name element is OS-dependent and implementation-defined, and specifically multiple leading slashes are acknowledged as often used as part of root-names on some operating systems (e.g. to implement UNC paths on Windows).

So, given a "///a" path, the following interpretations are possible:

  1. "///a" is a supported root-name, in which case only root-name is present and all other path elements are missing.
  2. The implementation supports root-names starting with "//" and followed by any number of non-directory-separator characters. In this case, "//" is the root-name, "/" is the root-directory and "a" is the relative-path.
  3. Same as #2 but root-name is required to have non-zero non-directory-separator characters following the leading "//". In this case, root-name would be missing, root-directory would be the leading "/", the following slashes would be ignored as duplicates and "a" would be the relative-path.
  4. The implementation does not support any root-names with "//", so, again root-name is missing, "/" is a root-directory separator, the following slashes are ignored as duplicates, and "a" is the relative-path.

The above list is probably non-exhaustive, but it presents the most reasonable interpretations of the path.

Depending on the interpretation, parent_path() would return different results in each case:

  1. The same path "///a" since there is no relative-path.
  2. "///", where root-name = "//" and root-directory = "/".
  3. "/", where root-name is missing and root-directory = "/".
  4. Same as #3.

Note that 3 and 4 would give different results for a more typical UNC path like "//host/share".

While some implementations may implement different behaviors on different operating systems (e.g. recognize leading "//" as part of root-name on Windows but not on POSIX systems), other implementations may choose to implement the same behavior across all supported OSes to avoid interoperability issues. Both ways would be permitted by the C++ standard.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.