If arrays are value types and therefore get copied, then how are they not thread safe?

4.9k views Asked by At

Reading this I learn that:

Instances of value types are not shared: every thread gets its own copy.* That means that every thread can read and write to its instance without having to worry about what other threads are doing.

Then I was brought to this answer and its comment

and was told:

an array, which is not, itself, thread-safe, is being accessed from multiple threads, so all interactions must be synchronized.

& about every thread gets its own copy I was told

if one thread is updating an array (presumably so you can see that edit from another queue), that simply doesn't apply

that simply doesn't apply <-- Why not?

I initially thought all of this is happening because the array ie a value type is getting wrapped into a class but to my amazement I was told NOT true! So I'm back to Swift 101 again :D

3

There are 3 answers

5
Rob On BEST ANSWER

The fundamental issue is the interpretation of "every thread gets its own copy".

Yes, we often use value types to ensure thread safety by providing every thread its own copy of an object (such as an array). But that is not the same thing as claiming that value types guarantee every thread will get its own copy.

Specifically, using closures, multiple threads can attempt to mutate the same value-type object. Here is an example of code that shows some non-thread-safe code interacting with a Swift Array value type:

let queue = DispatchQueue.global()

var employees = ["Bill", "Bob", "Joe"]

queue.async {
    let count = employees.count
    for index in 0 ..< count {
        print("\(employees[index])")
        Thread.sleep(forTimeInterval: 1)
    }
}

queue.async { 
    Thread.sleep(forTimeInterval: 0.5)
    employees.remove(at: 0)
}

(You generally wouldn't add sleep calls; I only added them to manifest race conditions that are otherwise hard to reproduce. You also shouldn't mutate an object from multiple threads like this without some synchronization, but I'm doing this to illustrate the problem.)

In these async calls, you're still referring to the same employees array defined earlier. So, in this particular example, we'll see it output "Bill", it will skip "Bob" (even though it was "Bill" that was removed), it will output "Joe" (now the second item), and then it will crash trying to access the third item in an array that now only has two items left.

Now, all that I illustrate above is that a single value type can be mutated by one thread while being used by another, thereby violating thread-safety. There are actually a whole series of more fundamental problems that can manifest themselves when writing code that is not thread-safe, but the above is merely one slightly contrived example.

But, you can ensure that this separate thread gets its own copy of the employees array by adding a "capture list" to that first async call to indicate that you want to work with a copy of the original employees array:

queue.async { [employees] in
    ...
}

Or, you'll automatically get this behavior if you pass this value type as a parameter to another method:

doSomethingAsynchronous(with: employees) { result in
    ...
}

In either of these two cases, you'll be enjoying value semantics and see a copy (or copy-on-write) of the original array, although the original array may have been mutated elsewhere.

Bottom line, my point is merely that value types do not guarantee that every thread has its own copy. The Array type is not (nor are many other mutable value types) thread-safe. But, like all value types, Swift offer simple mechanisms (some of them completely automatic and transparent) that will provide each thread its own copy, making it much easier to write thread-safe code.


Here's another example with another value type that makes the problem more obvious. Here's an example where a failure to write thread-safe code returns semantically invalid object:

let queue = DispatchQueue.global()

struct Person {
    var firstName: String
    var lastName: String
}

var person = Person(firstName: "Rob", lastName: "Ryan")

queue.async {
    Thread.sleep(forTimeInterval: 0.5)
    print("1: \(person)")
}

queue.async { 
    person.firstName = "Rachel"
    Thread.sleep(forTimeInterval: 1)
    person.lastName = "Moore"
    print("2: \(person)")
}

In this example, the first print statement will say, effectively "Rachel Ryan", which is neither "Rob Ryan" nor "Rachel Moore". In short, we're examining our Person while it is in an internally inconsistent state.

But, again, we can use a capture list to enjoy value semantics:

queue.async { [person] in
    Thread.sleep(forTimeInterval: 0.5)
    print("1: \(person)")
}

And in this case, it will say "Rob Ryan", oblivious to the fact that the original Person may be in the process of being mutated by another thread. (Clearly, the real problem is not fixed just by using value semantics in the first async call, but synchronizing the second async call and/or using value semantics there, too.)

7
Alexander On

Because Array is a value type, you're guaranteed that it has a single direct owner.

The issue comes from what happens when an array has more than one indirect owner. Consider this example:

Class Foo {
    let array = [Int]()

    func fillIfArrayIsEmpty() {
        guard array.isEmpty else { return }
        array += [Int](1...10)
    }
}

let foo = Foo();

doSomethingOnThread1 {
    foo.fillIfArrayIsEmpty()
}

doSomethingOnThread2 {
    foo.fillIfArrayIsEmpty()
}

array has a single direct owner: the foo instance it's contained in. However, both thread 1 and 2 have ownership of foo, and transitively, of the array within it. This means they can both mutate it asynchronously, so race conditions can occur.

Here's an example of what might occur:

  • Thread 1 starts running

  • array.isEmpty evaluates to false, the guard passes, and execution will continue passed it

  • Thread 1 has used up its CPU time, so it's kicked off the CPU. Thread 2 is scheduled on by the OS

  • Thread 2 is now running

  • array.isEmpty evaluates to false, the guard passes, and execution will continue passed it

  • array += [Int](1...10) is executed. array is now equal to [1, 2, 3, 4, 5, 6, 7, 8, 9]

  • Thread 2 is finished, and relinquishes the CPU, Thread 1 is scheduled on by the OS

  • Thread 1 resumes where it left off.

  • array += [Int](1...10) is executed. array is now equal to [1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9]. This wasn't supposed to happen!

0
mfaani On

You have a wrong assumption. You think that whatever you do with structs a copy will always magically happen. NOT true. If you copy them they will be copied as simple as that.

class someClass{ 
var anArray : Array = [1,2,3,4,5]

func copy{
var copiedArray = anArray // manipulating copiedArray & anArray at the same time will NEVER create a problem
} 

func myRead(_ index : Int){
print(anArray[index])
}

func myWrite(_ item : Int){
anArray.append(item)
}
}    

However inside your read & write funcs you are accessing anArraywithout copying it, so race-conditions can occur if both myRead and myWrite functions are called concurrently. You have to solve (see here) the issue by using queues.