I have a production issue with In-Proc session state.
Our application is base on MVC 3 .NET framework and is integrated into our site running Sitecore CMS.
Our users have been experiencing "Object reference not set to an instance of an object" randomly through out the application flow.
After extensive logging and tracing we could conclude this was caused when the session object returns null.
Here's to some details about what we found and what we know.
- Session ID is being persistent for the same user and passed all the way into the application correctly.
- I don't believe this is a code issue, because this only happen on production at random interval, never happen on local, dev, or staging environment.
- There's two production server running through a load balancer.
- Is not a server persistent issue, as we tested by sleeping one of the server and having all traffic route to one server. Also through logging we could identify that user are hitting the same server, but the session have became null.
- This doesn't seem to be a client issue as well, because they are able to go through the application successfully even if they have encountered an error before.
- This doesn't seem to be a traffic load or server load issue, because it happen through out the day at random times, and happens to random users during.
- This doesn't seem to be caused by recycling the app pool.
- This doesn't seem to be caused by session timeout as we have set the timeout to be two hour and while we track the log, users could experience this 5-10min into the flow.
Side note: We must use In-Proc session state due to our Sitecore CMS. So changing the design is not an option.
I have a theory it might have something to do with session locking or being corrupted from concurrent access attempts.
A few place we see the occurrence of this problem a lot from our application is when the users is being redirected by a javascript (windows.location).
And in areas where async ajax calls are being made.
We been scratching our heads on this for a while, I'm wondering if anyone out there would have any insight or theory to what might the problem be?
Thanks
Added Note:
@Mystere && @H27Studio, So I've also discovered something relating to sessionID or session reset issues. In some case we discover that on a page redirect it is triggering two duplicate GETS calls to the method, with the first call missing a sessionID and randomly get redirected to one of the server (This is because the server persistent session from the load balancer is base on client IP, sessionID and other header information to create unique session to keep a client on one server). This happen every time during the flow when our redirect page is using a window.location.
This will cause the "Object reference not set.." issue for the client if the bad, no sessionID call hit the same server. (This probably because the first bad call with no sessionID is causing the application to create a new session which overrides the original session's object) So even on the second call where the correct sessionID is pass into the application we will discover that session object contain null.
So I believe there is an issue with the duplicate call that's clearing out the session object, which not sure why or what is causing that to begin with.
Anyone have clue regarding this? Thanks
Update: We are planning to take these steps in hope to resolve this issue.
- We have issues in areas where Async Ajax calls were made, so we are planning to remove the Async feature and let it the Ajax run in sync.
- We have issues where a Windows.location javascript redirect is happening. We have created an alternative method using postback in hope of fixing the issue in this area.
- Other areas, which aren't related to one of the above issue are still up in the air.
Effect of change will be posted once we deploy it to production.
Thanks for all the comments.
After months of searching and Debugging, I think we finally came to a conclusion. There seems to be a bug with Sitecore Analytics Robots session timeout. We first notice that whenever the random session lost was due to session pre-maturely timing out, then we notice that these session were getting set to 1min timeout instead of 120min.
After searching through all the config files we notice that Sitecore Analytic.Robots.SessionTimeout was the only timeout value set to 1min.
By increasing this value, it solved our session timeout problem.
So the fundamental problem is Sitecore Analytics is mis-identifying some visitor session as robot session and reassigning their timeout to 1min. This is probably a bug to report.
Update: Response from Sitecore:
Sitecore CMS was designed to be used with ASP.NET WebForms technology. While using web forms, the bot detection relies on the control in the of the page. It's natural that you can't use it in the ASP.NET MVC application, but there is an easy solution - put the following code inside the element: