Need help understanding the behavior of a request in IIS pipeline

68 views Asked by At

lately I am working on a case of unwanted redirects on our Azure hosted ASP .NET Core app which uses IIS. We have a problem with request containing two HTTP/HTTPS prefixes. I entered the Failed Request Tracing logs and see this (look at the RequestURL value):

enter image description here

enter image description here

Which then later gets shortened to this: enter image description here

Anybody can help me understand why it happens? It happens before the request enters the rewrite pipeline, and even then, we don't have any rewrite rules which could cause it.

Our IIS rewrite rule after which redirection is happening:

<rule name="Add trailing slash" stopProcessing="true">
    <match url="(.*[^/])$"  ignoreCase="true"/>
    <conditions>
        <add input="{URL}" pattern="^(.+?)/$" negate="true"/>
        <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
        <add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" />
        <add input="{REQUEST_FILENAME}" pattern="(.*?)\.html$" negate="true" />
        <add input="{REQUEST_FILENAME}" pattern="(.*?)\.xml$" negate="true" />
        <add input="{REQUEST_FILENAME}" pattern="(.*?)\.aspx$" negate="true" />
        <add input="{URL}" pattern="^/(.*?)?(/?)(siteassets|globalassets|contentassets|SysSiteAssets|bundles|api|papirfly|dist|Static/HouseProductDemo|mini-profiler-resources|custom-routes|auth0|dam.papirfly)\/" negate="true"/>
        <add input="{URL}" pattern="^/(.*?)?(/?)(robots.txt|favicon.ico|Sitemap.xml|sitemap-index.xml)" negate="true"/>
        <add input="{URL}" pattern="^/(.*?)?(/?)(modules/Protected|EPiServer|EPiServer/CMS/admin|util|modulesbin|IndexingService/IndexingService.svc)" negate="true"/>
        <add input="{URL}" pattern="^/(.*?)?(/?)(productsOverviewAsync|languageSelectorAsync)" negate="true"/>
        <add input="{URL}" pattern="^/(.*?)?(/?)(documentationListAsync|downloadAndOrderItemsAsync|videoListAsync|commerceDocumentLibraryAsync)" negate="true"/>
        <add input="{URL}" pattern="^/(.*?)?(/?)(GatedContentPageAsync)" negate="true"/>
        <add input="{URL}" pattern="^/.*/sitemap.xml$" negate="true"/>
        <add input="{URL}" pattern="^(https?:/[^/]+){2,}" negate="true" />
    </conditions>
    <action type="Redirect" redirectType="Permanent" url="{R:1}/" />
</rule>
1

There are 1 answers

8
Dai On

"We have a problem with request containing two HTTP/HTTPS prefixes"

  • First, (and at risk of being accused of pedantry) they're not called "HTTP/HTTPS prefixes" - they're called URI schemes.
  • Second, the request you logged there does not actually contain two URI schemes; I'll explain...

But before that, please have a quick read through part 3 the current IETF specification that defines what a URI is and to learn terminology for its syntactical components.

  • The /http:/wp.pl text in your log does look like the start of a new URI when taken out of context, but it was still interpreted as the path component of the incoming request because that's what web-servers are programmed to do.
    • Fun-fact: HTTP requests do not (normally) contain a fully-qualified URI representation of the request (granted, they might have it stored in a private HTTP request header if they're hiding behind a reverse-proxy, and that also applies here as you're on Azure App Services (see the X-WAWS-Unencoded-URL header, but that's not fully-qualified...), but anyway).
      • Inside a HTTP request, there is no URI scheme or "protocol" substirng: the server knows if the request is coming from http: or https: based on the incoming port-number and TLS exchange anyway.
      • The authority part of the URI will be sent separately in the Host: header.
      • So it's only the path part and optional query parts that gets sent in the HTTP GET /path/goes/here part.
      • While any #fragment part won't be sent to the server at all.

So in this case, the request that you logged there is perfectly valid because there's no prohibition on colons (whether percent-encoded or not) in the path component of an absolute URI or "rooted" URI. It's only relative URIs paths that cannot contain a : in their initial path-segment as that renders them ambiguously formatted.