Google: SSL is not the problem

March 12, 2012 – tagged Google, Datenschutz, SEO, IT Security

Currently, many webmasters feel annoyed by Google's decision to completely switch to SSL for users logged in with their Google account (and we can expect this to apply to all users in the near future), as this switch makes it impossible to get the keywords a user was searching for before he entered a site. In this article, I will explain what the problem is; first in a simplified model, then in more detail.

Simplified model

Situation: A user searches for "zometool in der schule" at Google.

No SSL

Google workflow (simplified version), no SSL

  1. The user goes to http://www.google.de, enters "zometool in der schule" and hits Enter.

  2. He is redirected to something like http://www.google.de/search?channel=cs&ie=UTF-8&q=zometool+in+der+schule, the results page.

  3. He clicks on the first link, which brings him to http://www.vismath.eu/schule. The user's browser will, in its HTTP request to www.vismath.eu include a header field called the "Referer" (read the Wikipedia article on why the referrer field is misspelt in the specification) that contains the previous URL. A tracking software such as Piwik or Google Analytics can therefore see from that URL's "q" parameter what the user was searching for.

SSL

Google workflow (simplified version), SSL

  1. The user goes to https://encrypted.google.com (or https://www.google.de), enters "zometool in der schule" and hits Enter.

  2. He is redirected to something like https://encrypted.google.com/#hl=de&output=search&q=zometool+in+der+schule (or https://www.google.de/#hl=de&q=zometool+in+der+schule, respectively), the results page.

  3. He clicks on the first link, which brings him to http://www.vismath.eu/schule. In difference to what happens in the non-SSL case, the following part from the HTTP specification, 15.1.3 kicks in:

    Clients SHOULD NOT include a Referer header field in a (non-secure) HTTP request if the referring page was transferred with a secure protocol.

    Thus, the user's browser will, in its HTTP request to www.vismath.eu not include the "Referer" header field. A tracking software such as Piwik or Google Analytics is therefore unable to see what the user was searching for or even where he came from.

The reason that this post is called "SSL is not the problem" is that you can get around this issue quite easily by switching your own website to SSL. In this case, the browser would send the referrer along – problem solved. However, it seems to be important to Google to communicate to also non-SSL websites that the visitor came from Google, so in fact they do something else.

The Real World

Let's again look at the situation from above and see what's really happening.

No SSL

Google workflow, no SSL

  1. The user goes to http://www.google.de, enters "zometool in der schule" and hits Enter.

  2. He is redirected to something like http://www.google.de/search?channel=cs&ie=UTF-8&q=zometool+in+der+schule, the results page.

  3. When clicking on the first link (which is displayed as http://www.vismath.eu/schule), the link target changes to something like http://www.google.de/url?sa=t&rct=j&q=zometool%20in​%20der%20schule​&source=web&cd=1&ved=0CDoQFjAA​&url=http%3A%2F%2Fwww.​vismath.eu​%2Fschule&ei=...&usg=..., a page that immediately redirects to http://www.vismath.eu/schule (not in the HTTP Redirect sense, though).

  4. The user's browser will now, in its HTTP request to www.vismath.eu include as the value of the Referer header field the URL of that intermediate page. Because Google is so nice to include the user's search string in that URL's "q" parameter, a tracking software knows what the user was searching for.

SSL via encrypted.google.com

In the past, the encrypted SSL search was performed on a separate domain as explained in this Google blog post, so that schools can prevent access to encrypted search. When doing a search on encrypted.google.com, things happen as follows:

Google workflow, SSL

  1. The user goes to https://encrypted.google.com, enters "zometool in der schule" and hits Enter.

  2. He is redirected to something like https://encrypted.google.com/#hl=de​&output=search​&q=zometool+in+der+schule, the results page.

  3. When clicking on the first link (which is displayed as http://www.vismath.eu/schule), the link target changes to something like https://encrypted.google.com/url?sa=t&rct=j&q=zometool​%20in%20der%20schule&source=web&cd=1&ved=0CDgQFjAA​&url=http%3A%2F%2Fwww.​vismath.eu%2Fschule&ei=...&usg=..., a page that immediately redirects to http://www.vismath.eu/schule (not in the HTTP Redirect sense, though).

  4. As described above, because of the HTTPS → HTTP change, the browser will not send the Referer in his request to www.vismath.eu. Google Analytics and Piwik will display a direct site entry instead.

SSL via www.google.de

When users who are logged in at Google perform a search, they use the regular Google domain. Things happen a slightly different way here:

Google workflow, SSL

  1. The user goes to https://www.google.de, enters "zometool in der schule" and hits Enter.

  2. He is redirected to something like https://www.google.de/#hl=de&q=zometool+in+der+schule, the results page.

  3. When clicking on the first link (which is displayed as http://www.vismath.eu/schule), the link target changes to something like http://www.google.de/url?sa=t&rct=j&q=&esrc=s​&source=web&cd=1&cts=1331548858787&ved=0CDYQFjAA​&url=http%3A%2F%2Fwww.​vismath.eu%2Fschule&ei=...&usg=..., a page that immediately redirects to http://www.vismath.eu/schule.

    Looking carefully at that intermediate URL, one can notice two things: First, this URL is not an HTTPS URL! (It is when the actual target uses HTTPS.) When clicking on a link in the results page, the user ends the SSL connection and switches to a plain HTTP connection before he leaves Google. This also means that the browser will send the Referer when he is redirected to his final location.

    However, the second thing to notice is that the "q" parameter is empty. The user's search term is deliberately removed from this intermediate URL.

  4. When the user arrives at http://www.vismath.eu/schule, the HTTP request does contain the Referrer, but as the "q" parameter is empty, tracking software is not able to display the user's search terms; Google Analytics and Piwik will display "Keyword not provided" instead.

As mentioned above, one reason for Google to do this is probably that they still want people to know that their visitors came from Google. On the other hand, they are removing the search term in order to increase privacy for their users (as it does not make much sense to provide SSL encryption for the search itself, but still reveal the search term in unencrypted traffic).

Now this is in fact the real problem. Even if you switch your website to use SSL, the intermediate URL will not list the search query. While this is a step forward for privacy, it definitely makes it harder to optimize your site for certain keywords.