The concept of a very long instruction word CPU architecture is straightforward enough; as summarized on https://en.wikipedia.org/wiki/Very_long_instruction_word
one VLIW instruction encodes multiple operations, at least one operation for each execution unit of a device. For example, if a VLIW device has five execution units, then a VLIW instruction for the device has five operation fields, each field specifying what operation should be done on that corresponding execution unit. To accommodate these operation fields, VLIW instructions are usually at least 64 bits wide, and far wider on some architectures.
It's clear how this works if every operation takes one cycle, which ALU operations very well may do.
But I have not been able to find any mention of what happens when some operations take multiple cycles. Take a simple case where there are three execution units, add, multiply and load. Say add and multiply take one cycle each. The first instruction specifies one of each operation...
But load can take anywhere from a couple of cycles to a couple of hundred, depending on whether it hits cache or must go all the way to main memory. So what happens then? The add and multiply will be done, but the load is still in progress. If the CPU proceeds to the second, third etc instruction, it must keep track of an ever-increasing backlog of pending loads, negating the VLIW selling point of not having to spend any hardware resources on bookkeeping. If it waits until all operations in the current instruction are done, the add and multiply units will be sitting idle when they could have been proceeding with useful work (and would have, in an out of order processor).
How is this dealt with in actual VLIW CPUs?
For a more concrete example, suppose the operations we want to perform are:
r1 += r2, r10 = [r11]
r1 += r3
r1 += r4
r1 += r5
The subsequent additions won't fit in the first big instruction, and need to be in separate big instructions anyway because they are dependent on previous additions. But they are all independent of the load. Does putting them in subsequent big instructions, cause the CPU to think they depend on the load, thereby causing them to stall until the load is done?