The Perils of Search Engine Spider Sessions

This is going to be another “learn from my mistakes so you don’t get them, too”-type post.

I got an email today from someone who had bought something through one of our web sites. She had gotten a confirmation email, had clicked the little “Check Order Status” link, and everything was cool. That was three days ago. She got another email today, but this one had someone else’s name and shipping information on it. The second order was very similar to the first order, so the person clicked through the link on this one and saw that the last-4 for the credit card didn’t match her own. She wasn’t panicked, as many customers would be, but she wanted to let me know that something was definitely screwy, as she had not placed the second order.

Of course, my first thought was “omg! hax0r!”. As soon as I stopped giggling at the thought of someone hacking into our online store to steal $5 worth of product, I moved on to thinking it was obviously some kind of session-hijacking. Intentional, accidental, or whatever, it needed to be looked into.

Now, we’d had a problem with session hijacking (or, more appropriately, session tailgating) last year when I first started at the company. Whoever had set up the Verity spider hadn’t bothered to ensure that CFID and CFTOKEN attributes weren’t being indexed. Thus, every time someone clicked on a search result, they tailgated into a new session. Luckily, this never resulted in anything serious, it just ended up emptying a whole lot of shopping carts. (The online store here is crap. The cart abandonment is at like 87% because it’s just too hard to use. Magically emptying carts was just another nail in the coffin.) At any rate, I’d said a few choice words about how hiring a real programmer and not some quiche-eating n00b was always a good idea, and fixed the app.

Of course, that just ensured that Karma would mark down a point on the chalkboard and sit around waiting for me to do something stupid.

Back to today. I scoured through the log files to find the IP addresses of each of the two purchasers. I then removed all of the extraneous lines so that I could see how this second person had tailgated into the first person’s session:

  1. Check for CFID/CFTOKEN pairs (we log cookies, don’t you?) and whether or not they are consistent for the entire visit. Yep, in both cases.
  2. Check for CFID/CFTOKEN pairs hardcoded in URLs. Nope.
  3. How did they get to the site? What’s the referrer?

Grrr. There it was. Right at the top. Both had come in via Google searches. Google had the CFID/CFTOKEN pair in the link. Sure enough, when I did a search for the same things these people had search for, there was the link. And there was the CFID/CFTOKEN pair. I had missed them in Step 2 above, because I hadn’t scrolled all the way to the top of the log files to see the very first hit. I had assumed it would just be the home page. Dumb, dumb, dumb.

So the scenario becomes:

  1. Google spider visits the site and gets a session.
  2. Google somehow manages to get a link with a CFID and CFTOKEN and puts that in its index.
  3. Countless people visit our site via Google every day.
  4. One of those people resists the 87% abandonment rate and actually creates an account and buys something.
  5. We don’t log people out after they buy something, so the user variables stayed in the Session scope.
  6. Three days later, another person does the same web search.
  7. User tailgates on the session and neglects to notice that at the top of every page is a big yellow block that says “WELCOME BACK, PERSON 1!”. Or, they are hoping to get free stuff on someone else’s card.
  8. Person 2 also buys something, overriding the ship-to information, but not the bill-to information. We do not keep credit cards on file, nor in memory, so they had to put their own in. authorized the card even with the incorrect billing information.

“But”, I hear you asking, "why did the session last 3 days?" We get the vast majority of our traffic from Google. The Session never had a chance to die, as there was always activity in it. But, since we get literally less than an order every 2 hours, neither person had trouble with everyone else sitting in on their Session. The others were presumably just gawking at things randomly being added to their cart.

So … how did I fix it?

The easy way would have been to look for that session and blacklist it. One step up from that would be to force-kill a Session if the referrer was from another site and the current URL included CFID/CFTOKEN information. But how do you do that? ColdFusion unfortunately does not include a StartNewSession() function. You can StructClear(Session), but that does not give you a new session. That will solve the problem of the dangling user credentials, but it won’t solve the cart-crosstalk. You have to get the user new CFID and CFTOKEN cookies. But how can you do that without tromping on someone else’s session?

The naive way would be to just randomly generate new CFID and CFTOKEN cookies or URL parameters yourself. Even if you are using GUIDs, can you guarantee that you won’t end up in someone else’s session? No. It’s an admittedly slim possibility, but I wasn’t too keen on jumping from one handbasket to another on my journey to Hell, if you catch my drift. No, I needed to just clear out the cookies and URL parameters so that the CF server could assign news ones.

It turns out that this is harder than you would think. Especially if you are stuck on CF5. This is the code I ended up with in our Application.cfm:

We need to make sure none of the requests coming in from me outside include a session key
	<cfset Request.RefererHost=CGI.HTTP_REFERER>
	<cfset Request.RefererHost=REReplaceNoCase(Request.RefererHost,"https?[:]//","","ONE")>
	<cfset Request.RefererHost=REReplaceNoCase(Request.RefererHost,"^([^/]+)/.*$","\1","ALL")>
	<cfset Request.RefererHost=REReplaceNoCase(Request.RefererHost,"[:][0-9]*$","","ALL")>
	<cfset Request.RefererHost=REReplaceNoCase(Request.RefererHost,"^[^@]*@","","ALL")>
	<cfset Request.RefererHost=REReplaceNoCase(Request.RefererHost,"^www[.]","","ONE")>
	<cfif REFindNoCase("(foo|bar|foobar|baz|foobaz)[.]com$",Request.RefererHost) LT 1>
			<cfcontent reset="Yes">
			<!--- <cfheader statuscode="302" statustext="No Session Tailgating Please"> --->
			<cfset NewQS=REReplaceNoCase(NewQS,"[&]?(CFID|CFTOKEN)=[-%a-f0-9]*","","ALL")>
			<cfcookie name="CFID" value="" expires="NOW">
			<cfcookie name="CFTOKEN" value="" expires="NOW">
			<cfset StructClear(Session)>
			<cfset PathInfo=CGI.PATH_INFO>
			<cfif Left(PathInfo,Len(CGI.SCRIPT_NAME)) EQ CGI.SCRIPT_NAME>
				<cfif Len(PathInfo) GT Len(CGI.SCRIPT_NAME)>
					<cfset PathInfo=Mid(PathInfo,Len(CGI.SCRIPT_NAME) + 1,Len(CGI.PATH_INFO))>
					<cfif Left(PathInfo,1) EQ "/">
						<cfset PathInfo=Mid(PathInfo,2,Len(PathInfo))>
					<cfset PathInfo="">
			<cfif PathInfo NEQ ""><cfset NewURL=NewURL & "/" & PathInfo></cfif>
			<cfif NewQS NEQ ""><cfset NewURL=NewURL & "?" & NewQS></cfif>
<title>One moment, please.</title>
<meta http-equiv="refresh" content="0;url=#HTMLEditFormat(NewURL)#" />
<script language="JavaScript" type="text/javascript">
function cc() {
	document.cookie = 'CFID=; expires=Fri, 3 Aug 2001 20:47:11 UTC; path=/';
	document.cookie = 'CFTOKEN=; expires=Fri, 3 Aug 2001 20:47:11 UTC; path=/';
	return true;
<body onload="cc()">
<p>If your browser doesn't redirect you, <a href="#HTMLEditFormat(NewURL)#" onclick="cc()">click here to continue</a>.</p>

The first part is just stripping the HTTP Referer down to just a host name. I could do it with one big regular expression, but it seems so much more readable, logical, and especially maintainable this way. Next we look to see if the host is not one of our home domains. Our sites link to eachother quite a bit, so we didn’t want to lose our sessions in one of those links.

We then check for the CFID or CFTOKEN in the URL. If we find it (the cfcontent is there just for paranoia), we then jump through a bunch more hoops to preserve all of the other URL pairs. We set empty CFID and CFTOKEN cookies that are also expired, attempting to delete any the browser might already have. We clear out the session, which may screw someone else up, but hopefully not. Better safe than sorry.

Then there’s a bit of a two-step involving figuring out what the Path Info really is, as IIS and Apache report it differently. (And even CF has a grubby hand in munging it on some Unix systems.) We then build our new URL and …

We can’t just cflocation to redirect to it.

Some of you older CF gurus may know why. It’s the same reason the cfheader tag is commented out above. Here’s a hint: CF5. On CF5, any and all header information is lost on a redirect — this includes cookies, which are set in the HTTP headers. I tried cheating by setting my own status code and Location headers with cfcontent and cfheader instead of cflocation tags, but it doesn’t work. It turns out that CF5 converts all attempts at redirects into the same cflocation-like result. Thus, no matter how you do it, CF5 recognizes what you are trying to do and “helps” you do it correctly. And, oh yeah, discards any cookie headers you might have tried to set.

Back to the drawing board.

That last part is then the ugliest hack ever. We spit out a generic HTML document that does its best to delete the cookies. We’ve already tried to delete them in the headers with the cfcookie tags, but this javascript then tries 3 times to delete them again. (In the head, in the body onload, and again when the user clicks the link.) If that doesn’t do it, we throw in the towel.

So, yeah, that was fun.

And while it technically wasn’t my fault, I still feel bad that I’ve now been with the company for a bit over a year and never noticed the problem before. I’m supposed to see these things and scoff at my predecessors. That’s just how it works.

Watch your CFID and CFTOKEN parameters, folks. You could end up in much hotter water.

By Rick Osborne

I am a web geek who has been doing this sort of thing entirely too long. I rant, I muse, I whine. That is, I am not at all atypical for my breed.