RestartManager causes worker role to restart

3.3k views Asked by At

lately we encountered the problem that our Azure Worker Role service restarts almost every day. This is a huge problem for us, since our service needs around 20 minutes to initialize and these restarts can cause downtimes.
I logged in to the instances via RDP and looked in the event logs to figure out what was causing these seemingly random restarts. I came accross a few entries that always preceded a restart:

enter image description here

The service installed by the MsiInstaller is "Windows Azure Remote Forwarder". I assume this service gets installed because we enabled Remote Desktop in our worker role configuration. The interesting thing is that we have RDP enabled for a long time (2 years or so) but the random restarts are just occuring since 4 weeks ago.
But there are a few things than I don't quite understand:

  1. Why is this service installed or updated so frequently?
  2. I know that the RestartManager is responsible for installing/updating services without having to restart the machine by stopping other services that are blocking files.
    Is it possible that our service blocks some important files?
    Could it be a problem that we use a local disk storage for temporary files?
  3. Is it possible to tell the RestartManager to leave our Worker Role service alone?
  4. Is this just coincidence and the restarts are somehow triggered by our service, although no logs indicate errors on our side?

Any help is greatly appreciated.

Thanks,
Karsten

1

There are 1 answers

0
Stein Åsmul On

Self-Repair: What you are seeing is most likely Windows Installer self-repair. This is a mechanism to put files back in place if they have been unexpectedly modified, but this original purpose can trigger a lot of problems and make for endless loops of repair - and this is probably what has happened here. Very likely another product has been installed and an un-fixable error situation now exists that triggers continued and failed attempts to repair via MSI self-repair. The conflict situation must be identified with logging and event viewer debugging and a suitable fix has to be applied (real-world fixes).

Terse Explanation: Here is the most condensed explanation of what self-repair or "resilience" really is about that I have: Why does the MSI installer reconfigure if I delete a file?

Restart Manager: The Restart Manager feature is - as you say yourself (others might read) - simply a way for setups to restart applications instead of requiring a system reboot by "making applications capable of shutting themselves down and restarting in a controlled fashion".

  • What happens is probably that your service fails to shut down in a timely fashion using its native start / stop procedures - or the MSI does not attempt to restart the service with built-in MSI service control mechanism. Your service either fails to stop in time or fails to stop altogether. Maybe. I suppose this could trigger Restart Manager events. Certainly if you set REINSTALLMODE to "amus" - force overwrite all files regardless of version.
  • Seeing as the people here are developers, maybe a technical sample of how to implement Restart Manager support in your application: How do I add support for Windows Restart Manager to my application? (Advanced Installer).
  • Lots of Restart Manager links and information (mid page)

Default MSI Logging: One debugging starting point is to log all your MSI operations properly - whenever you install, reinstall or repair there will be a log file in the temp directory (not always acceptable for some sysadmins). You can enable logging for all MSI installations by following the procedure in the "Globally for all setups on a machine" section in the above link.


Self-Repair in Detail: I have written a lot about unexpected self-repair before. More than anyone wants to know. It is a terribly silly problem that does cause really expensive problems to resolve since few people are familiar with the operation of Windows Installer:

  1. Self-repair - explained
  2. Self-repair - finding real-world solutions
  3. Self-repair - how to avoid it in your own package

Debugging: All the information below is available in the above answers, but here are some quick pointers:

  • You can determine the exact MSI component that triggers the repair using the following approach: http://www.installsite.org/pages/en/msifaq/a/1037.htm.
  • Open the Event Viewer and look in "Applications" for warnings with event source "MsiInstaller": IDs 1001 and 1004.
  • Some recent installation of another package could trigger a constant error situation that can not be resolved permanently during repair and you must identify the source and eliminate it somehow. The item two link above (repeated here: finding real-world solutions).

Pending Reboots: How often is this machine rebooted? Many machines have a lot of pending reboots registered that are never completed and problems can result. There are many registry locations that can be involved in triggering a reboot (warning). Get-PendingReboot-Query. And a similar PowerShell script.

Locking Problems: Just want to mention the problem of some applications locking resources in a very low-level way, for example anti-virus and malware protection suites.