The Enigma of Anonymous Access

I had an interesting day troubleshooting an elusive issue with anonymous authentication for SharePoint farm at a customer site. Their farm is a network load balanced deployment:

     

       

 

There are three web applications for the farm (as noted above): Portal, Mysite, and SSPAdmin.  Portal is configured to use Anonymous authentication; MySite and SSPAdmin are configured for no anonymous access.

 The security architecture of Portal is designed so that it can be used by both domain and non-domain users.  Information at the top level is non-sensitive and can be viewed without requiring a login.  But as users drill deeper into the subwebs, the information becomes more sensitive, and at these points, users will be prompted for their credentials.

 The problem seen on this farm is that sometimes the anonymous settings did not seem to matter.  Anonymous users would be prompted for login credentials.  Domain users would be automatically logged in.  The symptoms were intermittent and confusing, and frustrating to users who needed access to that information.  No errors were logged in the server event logs or in SharePoint's logs.  

 All the affected sites used the Enterprise Publishing Template, so had Publishing features enabled. 

 I Googled the matter diligently and found little to indicate what was wrong.  We spent a few weeks trying to figure it out by ourselves, but ended up opening a support ticket with Microsoft.

 Before I tell you want the problem was, I will tell you what it wasn't.

 Not unsupported

My initial concern was that this configuration–a mixed authentication model for a single site collection–was simply beyond what Microsoft had intended for SharePoint.  I asked them right away:

  1. Is this supported or recommended as a best practice, to use both anonymous and NT authentication on a single site collection? 
  2. Does it make sense to have generic anonymous content, with the option to elevate access when encountering protected content?
  3. Should we relocate each subsite in its own Site Collection (since site collections are natural security boundaries)?
  4. Should we set up a separate farm for anonymous users (and does it require MOSS-FIS)?

 The engineer replied that in our current configuration, MOSS is both supported and designed for our scenario.

 Not misconfigured

Microsoft investigated all configuration and gave it a pass.  For all intents and purposes, we were correctly configured for anonymous access.  

 Not IIS-related

At one point, we overrode SharePoint's configuration by disabling Windows Integrated Authentication on both the WFE IIS servers, so that all authentication was forced to Anonymous.  Now, instead of prompting for credentials, attempting to load a problem page resulted in a 401 UNAUTHORIZED page being displayed.

 No pattern

There were four main tabs that were experiencing the problems.  As we tested, the problem would appear and/or disappear on any number of them.  We never did find a pattern to it.  The symptoms changed depending on which machine you were logged into. 

Not related to inheritance

Some of the subsites had broken security inheritance, but we eliminated that as a cause.

Not related to Publishing sites or workflow

At one point we thought that the issue might be that the subsites had unpublished pages, something the Microsoft engineer said can trigger a security prompt.  But we found this wasn't the case either. 

 Not related to load balancing

My suspicion was that one of the load balanced servers was misbehaving.  Maybe when a users' session was on one server, they experienced the problem, but if they connected to the other server, things would be fine? 

 I figured if we could take down one server at a time, NLB would automatically route all traffic to the surviving server, so we could isolate the problem.    So we tested this by suspending one node, and testing the symptoms, then suspending the other node, and running the same tests.  It did not matter which node was disabled, the symptoms did not change.

 Elusive

There were a few times when we thought we had it nailed, and the Microsoft engineer was chuckling as he prepared to close the ticket.  But then we'd see the symptoms crop up again.  It was maddening.

 At one point, we thought that simply going through all the affected sites and re-publishing them might fix the issue.  It did clear the symptoms momentarily, but when we moved to another machine and browsed from there, the symptoms persisted.

 Other times, when we thought we had isolated the problem on one subsite or another, we would make a change in one place (with momentary success) until we realized we suddenly now had access to several other subsites. 

 Solution

Finally we were on the right track.  The Microsoft engineer was off Googling or something, and my colleague and I were testing some theories.  We had already determined what was not causing the problem, so what remained was to find out what differences there were between the pages that worked and the pages that did not.  

 On several of the pages, we had Document Library web parts that were displaying the "created by" column.  We also had the "Contact Details" web part on each page.  It seemed to me that these components might be needing authenticated access to AD or another resource.  After all, they linked to MySite and displayed "presence" information. 

 So I proposed modifying the document library web part view to exclude the "Created by" column, and deleting the Contact Details web part off the page.

 When we deleted the Contact Details WP, everything began working!  Sort of.  We still had the intermittency of the problem, where some computers got a login prompt and others did not.  

 But the Microsoft guy came back and said he had reproduced the problem in his lab.  The issue was indeed the Contact Details web part, which does attempt to touch the associated Mysite when the page loads. 

 In our case, we have three web applications on our farm, and Mysite is off in its own web application.  So although the Portal web application has Anonymous enabled, the Mysite web application does not, and that is where the login prompt is coming from. 

How does this work again?

    SharePoint does not authenticate anyone.  It passes this off to other "authentication providers"… in this case, it's IIS.   IIS does the heavy lifting here: it prompts for a login and relays these credentials to Active Directory, which will validate the login.  Only after the authentication provider responds back with a "yes, I know this person" does SharePoint react: the user is then authorized to access whatever content they've been granted permission to see.

    In the case of Anonymous access, the default mode is to pass all requests directly to SharePoint.  If SharePoint is refusing anonymous access, IIS fails back to its standby authentication methods. 

    This is why we have both Anonymous and Integrated Windows Authenticated checkboxes selected in IIS:

       

    It tries Anonymous first, then Integrated Windows next.

     IIn our case, we have Portal and Mysite as two separate web applications, with different authentication providers. 

       

     Even though we had already authenticated (as anonymous) to Portal and been authorized by SharePoint, we were being passed off to MySite, which was configured for no anonymous access, and caused the page to throw up another login.

     But what about the intermittency?

Well, we never did figure out that part of it, but removing the Contact Details web part from all four of the subsites did resolve the issue on every machine we tested. 

 I chalk it up to latent sessions in IIS. Although we had cleared cache on our web browsers before every test, it's worth knowing that we were hitting two different web applications (Portal and MySite) on a load balanced, 2-server cluster.   So I'm sure there was something left over from previous sessions that caused the erratic symptoms.

Leave a Reply