BSOD due to DRIVER_POWER_STATE_FAILURE (9f) with virtual NIC driver

467 views Asked by At

I am working on a NDIS 6 miniport virtual NIC driver which does not handle IRP_MJ_POWER or IRP_MN_SET_POWER. The driver does not register in DispatchPower function for IRP_MJ_POWER. It sets this as NULL. The driver does power management using OID_PNP_CAPABILITIES and OID_PNP_SET_POWER.

Recently I observed a crash on Windows 7 machine with code DRIVER_POWER_STATE_FAILURE (9f). After doing some analysis, I observed that the crash happened because the driver did not process the Power IRP (IRP_MJ_POWER) for more than 10 minutes. This is the first time I observed this problem.

I am also curious to know more about delivery of IRP_MJ_POWER/IRP_MN_SET_POWER to drivers. Is it mandatory for NDIS drivers to handle these IRPs. I have seen multiple miniport drivers that do not have dispatch function registered for IRP_MJ_POWER. If it is not mandatory and driver does not need to register dispatch function for IRP_MJ_POWER, under what conditions, such a problem can happen.

1

There are 1 answers

1
Jeffrey Tippet On

NDIS miniports do not (and can not) handle IRPs directly. NDIS handles them on behalf of the miniport. In general, when you see the system exhibiting some WDM behavior (like a bugcheck that talks about IRPs) you need to mentally translate that into the equivalent NDIS concepts and callbacks. Unfortunately this is not always obvious at first -- it takes a few years of working with NDIS to get the hang of this.

In the particular case of 9F -- that's a very common bugcheck for a network adapter. About 70% of the time, it's caused because the miniport driver leaked a NET_BUFFER_LIST and didn't complete the NBL back to NDIS. This is a common code bug, because there are otherwise no symptoms of leaking an NBL, and it's all too overlook a small leak.

About 20% of the time, a 9F in networking is caused by some other thing getting stuck, like a stuck OID or a deadlock in MiniportPause. The remaining few percent of networking 9Fs are caused by bugs in filter drivers (who also tend to leak NBLs) and the occasional OS bug.

When debugging a 9F in a network driver, you should focus (1) identifying what thing specifically isn't getting completed (NBL, OID, function call, etc), and then (2) figure out why it's not getting completed. Sometimes you get lucky, and the kernel debugger command !stacks 2 ndis! has all the stacks in a neat little deadlock for you to unravel. Other times you need to add some diagnostics/counters to narrow down what happened to the NBLs.

Good luck.