As the cpu and gpu run asynchronous to another, when a buffer gets changed, previous reads might not yet be done. The simplest solution to this is stalling on the command to alter the buffer. As a stall is often the worst case for performance, this spawns a lot of topics about e.g. double-buffering for vertex-streaming (or similar).
Reading these, I went down the rabbit hole. In my current understanding, how the data is transferred to the gpu is more or less up to the driver. From immediately copying to gpu memory during the call to the related opengl function, to first copying in system memory, then in system memory to pinned pages again, and finally to gpu memory via DMT, everything goes.
However, this leads to me wondering. A stall is only necessary, when intending to copy to gpu memory directly, during the call, without allocating extra space. Otherwise, there is a copy already, and that can simply sit there, until all previous reads are done. Additionally, the driver has all the information. Would a sync be necessary, it could opt to "oh well, lets copy in RAM first".
This would push the issue on to following reads. Should e.g. a following draw call use the buffer, it would have to wait for the completion of the previous read (as the buffer data copy is waiting for that). While a wait, it would at least not block the cpu. The driver could however also automatically copy gpu-side, in a way orphan one, and everything seems solved?
That's my current plan: I want to stream edits to a buffer, which are potentially per-frame. As such, I might do something like taking two buffer handles, and alternate between them, roughly speaking "copy whole buffer A to B, orphan (invalidate) A, write to B, use B, repeat with A & B swapped for the next frame".
Do I really have to do this manually? Why isn't the driver guaranteed to do so automatically, when necessary?
PS: The buffer is changed once per frame, not more. Is the meta of a gpu trailing multiple frames even still relevant? Don't many modern renderers multi-pass, and sync at least once per frame anyways?
While I am on webgl2, I don't think the abstract concepts are specific here. There are some special cases, e.g. persistently mapped buffers, which don't exist on webgl2, but the stalls are most often described with e.g. bufferSubData as example.