1

I have a simple problem -- I have multiple threads that are doing appends to a file, based upon a file name that the user can choose.

At first, I tried to synchronize on instances of Path, but after reading comments from Reddit HERE and HERE, it appears that this is the wrong approach. Namely -- 2 different instances of String will result in 2 different instances of Path that are not equals, according to Path.equals(), but DO point to the exact same file. I tested this myself, and it appears that they are right. For example, consider this example that I ran on my Windows 11 machine.

final Path a = Path.of("abc.java"); //find abc.java in the current directory
final Path b = Path.of("./abc.java"); //go to current directory, then find abc.java
System.out.println(a.equals(b)); //false, but they are the same file!

(and to be clear, the reason I care about Path.equals() is because that will allow it to be unique in a Map<Path, V> or a Set<Path>, and then I can use those containers to get the same instance always. After all, the synchronized(someObject) statement handles thread synchronization based on whether or not a == b and NOT on a.equals(b). Just wanted to clarify, since it wasn't clear my intentions before.

I need something to synchronize on. And it is clear that just synchronizing on an instance of Path that I put into my Map<Path, V> made via Path.of(sanitizedStringFromUser) isn't going to cut it. Sure, I could play "whack-a-mole", and try to find all of the possible ways that 2 different instances of String could resolve to the same file when served to Path.of(). But I am certain that I am reinventing the wheel here.

How should I resolve this, using idiomatic Java?

4
  • 1
    for nix OSes here's how to get the inode stackoverflow.com/questions/53935601/… Commented Jun 13 at 0:06
  • 2
    It doesn’t matter whether equals returns true or not, synchronized doesn’t care about equality at all. All threads must use synchronized with the same object, i.e. a == b must be true, to have an effect. Commented Jun 16 at 10:40
  • @Holger Apologies, I should have clarified that I was storing these instances of Path as keys in a Map<Path, V>. So, the uniqueness traits of keys for instances of Map are taking care of that for me. But yes, you are 100% correct, and in fact, I am pretty sure I was thinking that it operated on Path::equals. I will edit my post shortly to clarify this point. EDIT -- fixed. Lmk if you think it should be improved further. Commented Jun 17 at 0:05
  • 2
    By the way, a path like b = Path.of("./abc.java") can be converted to Path.of("abc.java") by using b.normalize(), which is independent of the actual filesystem state. But, for example, Path.of("abc.java") and Path.of("../foo/abc.java") might point to the same file if the current directory has the name foo which requires accessing the actual filesystem to find out. That assuming you’re always using the default method of resolving against the current directory. You’re free to use arbitraryPath.resolve(b) instead, which may have an entirely different outcome. Commented Jun 17 at 11:17

2 Answers 2

6

I have a simple problem

No you don't. For example, the answer to your question depends on how you care to define the word "the same". You have a complex problem that cannot be answered at all without a list of caveats.

I better get into those caveats then.

It's a JVM, not an OS

Any other process on the system can, of course, write to those files too and cause all sorts of havoc.

One commonly employed fix is to start with your "I append to a file" and call it right then and there. Do not do that. If you have a file that just poofs into existence fully formed, that makes many things a lot simpler. There is no need to worry about the notion of '... but what if some other process sees that file and goes: Swell! I can read it! but this then fails because you weren't done appending to it', for example.

To do this, you create a temporary file (which you can create using a method invocation that ensures it truly is unique), append until you're happy with it, and then atomically rename it into the correct place. File systems do support atomic creation and renaming (meaning: If 2 processes simultaneously atomically rename some file into the same x.foo file, at least one of them will fail, guaranteed). They do not support atomic filling, i.e. you can't ask the OS: I want to write to this file for a while, can you make it look to other processes like it does not exist at all until I tell you I'm done with it?

Which is why you fake that by saying 'give me a unique file atomically and guaranteed (filesystems can do that) and then 'move it to its final location atomically, i.e. only if it does not already exist and if 2 processes attempt to do this simultaneously, all-but-one will fail' (filesystems can do that as well). This then means to other processes there is no file until all of a sudden there is a file, and it is in its complete, finished state.

If you think that is not appropriate.. it is, you just need to redesign those systems that need this file to do this.

Unless, of course, you're talking about log files. Or rather, even then - having multiple separate systems that all try to make a sensible, consecutive log file is not possible either - instead each process should write its own log, and if you want, you can merge them later (either once all are done, or if these are forever-running processes, they should rotate their logs and you can thus merge all the rotated-out logs, as they are 'finished'). We're now back to the start: You have processes using atomic access to create unique files they definitely own and there is no risk.

But the JVM is one system!

So use the JVM's tools. synchronize on some logger object, have all things send their logs to that logger object and now this logger thing is the one and only bit of code that needs to open a file and write to it. Alternatively, have each part of the JVM write to a unique file, and merge them later.

Nevermind all that, I just want an answer to my question

Well, what you want is impossible, which is why you need to go with the alternatives.

Take, for example, this 'trick':

touch foo.txt
ln foo.txt bar.txt

You now have 2 files - foo.txt and bar.txt. They are separate in all ways. No possible imagination of "path equality" would ever call these 2 things the same.

Nevertheless, write to one and you end up changing the other. Because they are hardlinked together. There is no canonical path here - even though I first created foo and then hardlinked it into bar, as far as the file system is concerned, foo and bar are peers. foo is not 'more canonical' than bar. Had you reversed the operation (create bar first, then hardlink it into foo), the bits on disk are identical in every way except, possibly, timestamps, which surely you don't want to look at, and which can be made equal trivially.

And yet, if within your JVM you decide to open an append stream on both of these it'll be one heck of a mess. Understandable that you'd want to avoid this, but you can't. At least, not in a way that java supports, i.e. not in an OS independent way.

If you want to merely try to get somewhat close, there's path.toRealPath() which will follow softlinks and which will apply .. and . as well, but, this does not give you 100% guarantees that you won't end up with 2 appenders that congeal their output into a big old mess.

On presumably most systems, you could use Files.isSameFile. That method should return true if giving it 2 paths to different locations that are hardlinks of each other. The javadoc is rather vague, as per a comment from Sweeper, it works on MacOS, and therefore presumably all posixy systems, at least. Note that windows also has hardlinks these days, made with fsutil if memory serves, you should test if you want to use this.

Presumably you'd have a list of all existing appenders, and anytime any code wants to make another appender, you'd check every item in the existing list with isSameFile; no lookup is possible here, you'd have to do this possibly relatively expensive operation. You might want to therefore cache the results of such an operation.

But isSameFile doesn't let you write a guaranteed system. To get that guarantee, you have each appender atomically create a new file. Now it is not possible for them to clash, and it's guaranteed.

How do I do that?

To create a file in a way that you know, 100% guaranteed, there is no clash, you use:

try (var out = Files.newOutputStream(pathToFile, StandardOpenOption.CREATE_NEW)) {
 ... 
}

The only way to run into trouble here is if some other process (or some other code inside your JVM process) finds that file and decides to also write to it. At that point, it's 'pilot error'. You can't stop the user from tossing their computer in a blender either. The point is, if every 'appender' uses CREATE_NEW it is impossible to get a clash.

If you want to rename them into the right place, you use:

Files.move(pathToTempFile, pathToFinalFile, StandardCopyOption.ATOMIC_MOVE)

This will move it only if pathToFinalFile doesn't already exist (no matter how much resolving or unrolling of soft links, aliases, .. / ., has to be done), and will guarantee that this holds up even if 2 processes or threads attempt to do this simultaneously.

That just leaves 'how do I make a temp file' - generally just, in a loop, append random numbers, keep calling Files.newOutputStream(..., CREATE_NEW) until it works, use that. You can use java's baked in temp file generator to do this if you must, Files.createTempFile.

Sign up to request clarification or add additional context in comments.

4 Comments

Files.isSameFile seems to handle hard links, at least on macOS. That could be an answer to the question in the title "how to determine if 2 instances of java.nio.file.Path are pointing to the same file" if we disregard the context.
I'm going to edit this answer and add it, that seems quite pertinent. Not quite sure how that's going to go in practice (anytime you add any file, you blast it past every already existing 'active appender', call isSameFile on the lot, and if you find one, then merge those 2 appenders JVM side, I guess?). Still, rather relevant here.
Thanks for posting this. I'm juggling some other stuff, so don't have the time to read through this all properly yet. I will get to it, and likely accept this answer once I get free.
I have a second to breath, so I'll take it to say thanks for the solid answer Reinier. I think what you are saying is right, and the only reason I went for an append instead of creating new files was for the potential feature enhancement of inserting records to the file so that they are ordered by some attribute. But that's premature optimization -- I'll deal with those performance problems when I get there. For now, I'll go with what you suggested. I'll use the strategy of making a new file name by incrementing some number, since making a temp file then moving it slows down performance. Ty vm
1

EDIT -- As addressed in the other answers, this answer is problematic for reasons mentioned above. Once I get a second to breath, I will fix this answer to avoid giving misinformation. But for now, PLEASE DISREGARD THIS ANSWER.


The term you are looking for is a Canonical Path. In fact, it would be good to read up on the term Canonical, as that will provide useful context.

So, if you want to know if 2 instances of Path are pointing to the same object, then find the Canonical Path to the file.

The way to find the Canonical Path to a file in Java is to use the method Path.toRealPath().

System.out.println(a.toRealPath().equals(b.toRealPath())); //true, as it should be!

3 Comments

Slightly dangerous answer; this does not address hardlinks.
…and the result can change over time, especially when you’re going from a path to a non-existing file to an existing file in the first append operation.
With the existence of Reinier's answer, does it make more sense to delete this one?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.