Why isn't Mutex unlocking?

2.4k views Asked by At

I am trying to implement a global object pool for a large Obj type. Here is the code for POOL:

static mut POOL: Option<Mutex<Vec<Obj>>> = None;
static INIT: Once = ONCE_INIT;

pub struct Obj;

Here is how I am accessing and locking POOL:

fn get_pool<'a>() -> MutexGuard<'a, Vec<Obj>> {
    unsafe {
        match POOL {
            Some(ref mutex) => mutex.lock().unwrap(),
            None            => {
                INIT.call_once(|| {
                    POOL = Some(Mutex::new(vec![]));
                });
                get_pool()
            }
        }
    }
}

This is the code that is causing problems:

impl Drop for Obj {
    fn drop(&mut self) {
        println!("dropping.");
        println!("hangs here...");
        get_pool().push(Obj {});
    }
}

impl Obj {
    pub fn new() -> Obj {
        println!("initializing");
        get_pool().pop().unwrap_or(Obj {})
        // for some reason, the mutex does not get unlocked at this point...
    }
}

I think it has something to do with the lifetime 'a of the MutexGuard in the return value of get_pool. Frankly I'm probably a little confused about the way these lifetime parameters work.

Here is a link to a playground with a working example. Thanks for your help and merry Christmas.

1

There are 1 answers

2
wimh On BEST ANSWER

The problem is located in this line:

get_pool().pop().unwrap_or(Obj {})

Because you call get_pool(), you lock the mutex, and it will not be unlocked until the end of the line. However in the call to unwrap_or(), you create a new Obj. This won't be used if there was an object in the vec. Because it is created later, it will be dropped before the mutex is released. As the drop tries to lock the mutex, you get a deadlock.

To fix this, split that statement across two lines:

let o = get_pool().pop();
o.unwrap_or(Obj {})

As a related note, you can use lazy-static to avoid unsafe code:

#![feature(drop_types_in_const)]
use std::sync::{Mutex, MutexGuard};

#[macro_use]
extern crate lazy_static;

lazy_static! {
  static ref POOL: Mutex<Vec<Obj>> = Mutex::new(vec![]);
}

pub struct Obj;

fn get_pool<'a>() -> MutexGuard<'a, Vec<Obj>> {
        POOL.lock().unwrap()
}

impl Drop for Obj {
    fn drop(&mut self) {
        println!("dropping.");
        println!("hangs here...");
        get_pool().push(Obj {});
        println!("not here...");
    }
}

impl Obj {
    pub fn new() -> Obj {
        println!("initializing");
        let o = get_pool().pop();
        o.unwrap_or(Obj {})
    }
}

fn main() {
    Obj::new();
    Obj::new();
    println!("Now reaches this point.");
}

edit

As requested I'll explain how I diagnosed this;

  1. First I verified if I could reproduce the problem using the sample you gave. Which I could, and the code was simple and clear enough. This was all fine, I only needed to add the line println!("not here..."); to be 100% sure it hangs at the statement above, and not at the end of the block.
  2. In a first scan, I noticed that Obj::new(); had to be called twice for the problem to happen. So the next goal is to find the differences between both calls. (My knowledge of rust is not good enough yet to spot this error by just reading the code).
  3. Because POOL is not initialized in the first call, I added the initializing at the start of main (unsafe{INIT.call_once(||{POOL=Some(Mutex::new(vec![]));});}), but that did not change anything.
  4. Because an object is added to the pool when Obj is dropped, I added an object at the start of main (get_pool().push(Obj {});). Now it hangs at the first Obj::new();.
  5. I could further simplify it by calling get_pool().pop().unwrap_or(Obj {}); next in main.
  6. Now I could partly remove or split that line to determine exactly where it hangs. By doing that I saw I was able to fix it. Then I realized an extra Obj was created there. Note that rust borrow scopes are currently lexical.
  7. In retrospect, I would have spotted this earlier if I removed the line containing get_pool() in drop(), and counted how many times drop() was called. I did not realize is was called three times instead of twice.

As a general note, the title of this question is "Why isn't Mutex unlocking". This could be interpreted as a compiler bug, or a bug in the standard library. Most of the time (>99%) it is not. It is important to keep that in mind, and not focus on the wrong issue.

This problem is related to global shared state. Try to avoid that. (Yes I know that's not always possible).