1

I want to perform updates on a file via multiple processes in parallel. These processes all open this file for write in parallel.

Abbreviations used:

  • f : file,
  • p[i] : process i,
  • b[i] : buffer for FD i opened by process i.

Questions:

  1. When a file is opened and a stream is established, does the fpath internally translate to inode number? I read inode is unique only across a partition.
  2. When same file is opened in parallel, how does linux manage writes?
  3. If b[1] is full, it will flush. Does this mean all p[i] will start to see changes in file? This does not happen. So where are contents of the buffer flushed? If COW happens, does this mean linux creates a copy of the dirty page on disk? Or does something like MVCC? (I am assuming instead of copying all pages only dirty pages get rewritten since otherwise modifying a huge file would be troublesome)
  4. As an experiment, I opened a file using vi editor. I deleted the file using terminal and in editor, added some text to file and saved. File had been recreated. In another case when file was not edited, it did not exist any longer when I closed it in editor. Seems like COW is in working. But since file was deleted in 2nd case, did COW use in-memory pages of file to recreate the file? What if file was 10GB in size and unable to fit in memory at once?
5
  • What is your real world problem? Commented Mar 2, 2021 at 11:51
  • I am trying to understand in general how linux handles file open and writes from multiple streams in parallel. Commented Mar 2, 2021 at 12:20
  • 1
    If you want all processes to see changes then open unbuffered. Commented Mar 2, 2021 at 12:22
  • @stark does this mean in case of buffered IO, as soon as buffer is flushed, this flush happens to file and other processes can see? If i have the f opened in P[1] and edit it at location 101 while P[2] modifies f at 100, add 10 bytes of data and flushes before P[1], would my changes at location 101 by P[1] still appear at location 101 or would my text have moved to location 111? In unbuffered, changes will be seen more quickly by other processes. Commented Mar 2, 2021 at 12:27
  • stream buffers are local to the process. Flushing or unbuffered puts the data in the common page cache where all processes will see it. I/O to nearby locations will still be a problem due to read-modify-write. There can be race conditions. Commented Mar 2, 2021 at 14:27

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.