How Gmail Works

Posted on at


How Gmail Works
By now you’ve learned how to use Gmail with some flair, and
you can change the way it looks to a certain extent. Now
you have to look into exactly how it works. You already
know that the majority of the Gmail functionality is enacted
client-side—that is, on the browser, rather than at the server—
and is done with JavaScript. This chapter describes exactly how
this works and how you can exploit it.
What the Devil Is Going On?
Before revealing just what’s happening, let’s recap. In Chapter 4
you used the DOM inspector inside Firefox to help you dissect
the HTML, and this will help you again. So, as before, open up
Gmail in Firefox, and open the DOM inspector.
You already know that the main document is made of two frames,
the first made of many subframes and the second one with nothing
but a huge chunk of JavaScript. Figure 5-1 shows you that in
the DOM inspector.
Using the DOM inspector’s right-click menu Copy as XML
function, you can grab the text of the script and copy it to a text
editor. Ordinarily, I would include this code as a listing right
here, but when I cut and pasted it into the manuscript of this
book, it added another 120 pages in a single keystroke. This does
not bode well, especially as Google has tried as hard as it can to
format the JavaScript as tightly as possible. This saves bandwidth
but doesn’t help anyone else read what Google is doing.We’ll
reach that problem in a page or two.
˛ Getting at the code
˛ The interface
˛ XMLHttpRequest
˛ Packet sniffing
˛ Probing the
interface
˛ Decoding the data
chapter
in this chapter
54 Part II — Getting Inside Gmail
FIGURE 5-1: The location of the
Gmail JavaScript shown with
the DOM inspector
Back to the browser, then, and you find you have a very complicated page seemingly
made up of close to 250KB of JavaScript, one iFrame you can see, and
apparently ten or more that don’t appear on the screen. Furthermore, the eagleeyed
in our midst will have noticed that the Gmail URL doesn’t change very
much when you’re moving around the application. Changing from Inbox to All
Mail for the subset of your mail you want to see on the screen changes the page
but not the URL. For anyone used to, say, Hotmail, this is all very puzzling.
Preloading the Interface
What is actually happening is this: Gmail loads its entire interface into the one
single HTML page.When you move around the application, you’re not loading
new pages, but triggering the JavaScript to show you other parts of the page you
have already in your browser’s memory. This is why it is so fast: There’s no network
connection needed to bring up the Compose window, or show the Settings
page, as you’ve already loaded it. You can see this inside the DOM inspector.
Figure 5-2 shows the section of the page with the various divs, each containing
part of the interface.
You’ll remember from Chapter 4 that the div d_tlist contains the majority of
the interface for the Inbox.Well, further inspection shows that d_comp holds the
Compose window, and d_prefs hold the Settings window, and so on.
This is all very interesting, but it doesn’t really show how the application works. If
anything, it asks a difficult question: if the page never refreshes, how does it send
or receive any messages? The answer to this is in the JavaScript, and the use of one
very clever function, XMLHttpRequest.
Chapter 5 — How Gmail Works 55
FIGURE 5-2: The main interface divs
Introducing XMLHttpRequest
I like to think of this as quite a romantic story. JavaScript, you see, has had a bad
rap over the years: it’s commonly misconceived as a scrappy language for dodgy
website effects circa 1999, and up there with the <blink> tag as something to be
avoided by the truly righteous web developer. This is, of course, utter rot: Modern
JavaScript is a rich and powerful language, and is rapidly regaining momentum.
Perhaps since IE5 was launched, and certainly since Mozilla and Safari became
mainstream, the majority of browsers have been capable of doing some very clever
things in JavaScript. It’s just that no one bothered to look.
One such function is XMLHttpRequest. Invented by Microsoft and now universally
implemented, it allows a JavaScript program to communicate with a server in
the background, without refreshing the page. This is very key for Gmail. It means
that the JavaScript code can, upon a button push or any other trigger, send a tiny
request to the Gmail server, parse the response, and throw it onto the screen,
entirely without refreshing the page or causing any more traffic than is really necessary.
It’s blazingly fast, especially if you have a server optimized for just such a
thing. Google, naturally, does.
Using XMLHttpRequest Yourself
To get an idea of just what is going on, it’s a good idea to use XMLHttpRequest
yourself. In this section you’ll use it to create a little application of your own. You
can skip this section if you’re not interested in a deep understanding, but it’s pretty
cool stuff to play with anyway.
56 Part II — Getting Inside Gmail
First, open up a directory on a website. You’ll need to access it via a proper
domain, you see. Create the directory, and make sure your browser can see it. In
that directory, place a text file, called Listing.txt, and put the exclamation
“Horrible!” inside the file. Bear with me.
Then create an HTML file, containing the code in Listing 5-1, and save this file
to the directory you created earlier.
Listing 5-1: Listing.html—Showing XMLHttpRequest
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN”
“http://www.w3.org/tr/xhtml1/DTD/xhtml1-transitional.dtd”>
<html>
<head>
<style></style>
<script type=”text/javascript”>
var xmlhttp=false;
try {
xmlhttp = new ActiveXObject(“Msxml2.XMLHTTP”);
} catch (e) {
try {
xmlhttp = new ActiveXObject(“Microsoft.XMLHTTP”);
} catch (E) {
xmlhttp = false;
}
}
if (!xmlhttp && typeof XMLHttpRequest!=’undefined’) {
xmlhttp = new XMLHttpRequest();
}
function Listing1() {
xmlhttp.open(“GET”, “Listing.txt”,true);
xmlhttp.onreadystatechange=function() {
if (xmlhttp.readyState==4) {
alert(xmlhttp.responseText)
}
}
xmlhttp.send()
}
</script>
</head>
<body>
<h1>My Dog Has No Nose.</h1>
Chapter 5 — How Gmail Works 57
<a href=”/” onclick=”Listing1();return false;”>How does it
smell?</a>
</body>
<html>
Open Listing.html in a browser and it should appear very much like Figure 5-3.
FIGURE 5-3: Ready to click on the link?
And when you click on the link, you should get a pop-up alert box similar to
Figure 5-4.
FIGURE 5-4: The result of an XMLHttpRequest function call
58 Part II — Getting Inside Gmail
What has happened here? Well, the link in the code doesn’t go anywhere, but
clicking it sets the JavaScript going. Have a look at the first half of the code again:
<script type=”text/javascript”>
var xmlhttp=false;
try {
xmlhttp = new ActiveXObject(“Msxml2.XMLHTTP”);
} catch (e) {
try {
xmlhttp = new ActiveXObject(“Microsoft.XMLHTTP”);
} catch (E) {
xmlhttp = false;
}
}
if (!xmlhttp && typeof XMLHttpRequest!=’undefined’) {
xmlhttp = new XMLHttpRequest();
}
Stepping through this from the beginning, you set up a variable called xmlhttp
and set it to false. You use this variable to help check which browser you’re using.
The XMLHttpRequest object is called different things in different applications
(Technically speaking, it’s not a standard part of the JavaScript specification, so
different people call it different things. Ho hum.). In Microsoft browsers, it’s an
Active X object called Msxml2.XMLHTTP or Microsoft.XMLHTTP, whereas in
Mozilla, Safari, and others, it’s a standard JavaScript function called
XMLHttpRequest.
So the first half of the code goes through the alternatives, trying to define xmlhttp
as an XMLHttpRequest object by calling each of the possible functions in
turn. First it tries Msxml2.XMLHTTP, then Microsoft.XMLHTTP, and finally
defaults to XMLHttpRequest. (Usually, of course, there’s another test for no-
JavaScript-support-at-all, but we’ll skip that here for the sake of brevity.)
Now, go line by line through the second half of the code:
function Listing1() {
xmlhttp.open(“GET”, “Listing.txt”,true);
xmlhttp.onreadystatechange=function() {
if (xmlhttp.readyState==4) {
alert(xmlhttp.responseText)
}
}
xmlhttp.send()
}
Chapter 5 — How Gmail Works 59
The first line defines the name of the function: Listing1.
The second line sets up the open method of the XMLHttpRequest function
you’ve placed into the xmlhttp object. XMLHttpRequest has six possible methods
to call, as you’ll see later. The open method takes three parameters: the HTTP
call (such as GET or POST), the URL, and a flag of true or false to indicate if
the request is asynchronous (set to true) or not (set to false). Asynchronous in
this context means that the script continues processing while it is waiting for the
server to reply. In this listing it’s not a big deal, but in others this is very important:
If you set the request to false, and the server takes a long time to get back to
you, you can lock up the browser in the meantime.
The third line solves this problem. It sets up an onreadystatechange event
handler, which waits for the XMLHttpRequest function’s state to change before
running the function it has defined. The possible values for onreadystate
change are in Table 5-2, but in the meantime know that readyState=4 means
that the XMLHttpRequest function has completed its task. So, lines 3 and 4 mean
“Wait until the function’s state has changed, and if the state has changed to ‘complete’
then do the following; if not, keep waiting.”
Line 5 is triggered if 3 and 4 come true. It displays an alert box, containing the
result of the responseText method. This contains the contents of Listing.txt.
Lines 6 and 7 close off the functions prettily, and line 8 triggers the communication
itself. Note the order this all comes in: You’ve set up the request ready to go.
You’ve set up an Event Handler, watching for any request to come back and say
it’s done, and only then do you fire off the request itself.
So, now you’ve got a page with JavaScript code that can go out, fetch another file,
and do something with its contents, all without refreshing the HTML. In our
listing, it’s a file with plain text, but it can be just about anything: XML, for
example.
Before moving on to using this new knowledge to look into Gmail’s code, have a
look at Tables 5-1 and 5-2, which serve as a reference of the XMLHttpRequest
functions, methods, and suchlike.
Table 5-1 XMLHttpRequest Object Methods
Method Description
abort() Stops the current request.
getAllResponseHeaders() Returns complete set of headers (labels and
values) as a string.
Continued
60 Part II — Getting Inside Gmail
Table 5-1 (continued)
Method Description
getResponseHeader(“headerLabel”) Returns the string value of a single header
label.
open(“method”, “URL”[, asyncFlag[, Assigns the method, the URL, and the other
“userName”[, “password”]]]) optional attributes of a pending request.
send(content) Sends the request, with an optional postable
string or bit of DOM object data.
setRequestHeader(“label”, “value”) Assigns a label/value pair to the header to be
sent with a request.
Table 5-2 contains some of the XMLHttpRequest object properties you’ll likely
need to use.
Table 5-2 Common XMLHttpRequest Object Properties
Property Description
onreadystatechange Event handler for an event. It fires whenever the state changes.
readyState Object status integer:
0 = uninitialized
1 = loading
2 = loaded
3 = interactive
4 = complete
responseText The data returned from the server, as a string.
responseXML The data returned from the server, as a DOM-compatible
document object.
status Numeric http status code returned by server, such as 404 for
“Not Found” or 200 for “OK.”
statusText Any string message accompanying the status code.
You should now feel confident that you understand how a simple HTML and
JavaScript document can request data from a server in the background. There’s no
need for the page to reload in the browser for you to retrieve new information.
Chapter 5 — How Gmail Works 61
Finding XMLHttpRequest within the Gmail code
Don’t take the presence of XMLHttpRequest within Gmail on trust. You can see
this in action in Gmail’s own code. Go back to the DOM inspector and open the
second frameset—the one with all of the JavaScript in it. Copy the entire script
into a text editor and save it, as you’re going to refer to it a lot in this section.
Once you’ve done that, search for the string xmlhttp. You’ll find the function in
Listing 5-2.
Listing 5-2: Gmail’s XMLHttpRequest Function
function zd(){var R=null;if(da){var
vN=lJ?”Microsoft.XMLHTTP”:”Msxml2.XMLHTTP”;try{R=new
ActiveXObject(vN)}catch(f){C(f);alert(“You need to enable active
scripting and activeX controls.”)}}else{R=new
XMLHttpRequest();if(!R){;alert(“XMLHttpRequest is not supported on
this browser.”)}}return R}
As with all of the Gmail JavaScript, this is compressed and slightly confusing.
Reformatted, it looks like Listing 5-3.
Listing 5-3: Gmail’s XMLHttpRequest Function, Tidied
function zd(){
var R=null;
if(da){
var vN=lJ?”Microsoft.XMLHTTP”:”Msxml2.XMLHTTP”;
try{R=new ActiveXObject(vN)}
catch(f){
C(f);alert(“You need to enable active scripting and
activeX controls.”)}
}else{
R=new XMLHttpRequest();
if(!R){
;alert(“XMLHttpRequest is not supported on this
browser.”)}
}
return R}
This listing does exactly the same thing you did earlier: tries out the Microsoft
Active X controls, then tries the more standard XMLHttpRequest and then, if all
fails, bails with an error message. For future reference, and remember this because
you’ll need it later, the XMLHttpRequest object in the Gmail code is called R.
62 Part II — Getting Inside Gmail
Sniffing the Network Traffic
So now that you understand how XMLHttpRequest works, you’re led to some further
questions:What is being sent and received using the XMLHttpRequest functions,
and what are the URLs? Once you know the answers to these questions,
you can write your own code to spoof these requests, and can then interface
directly with the Gmail system. The rest of the book relies on this idea.
To find out what Gmail is saying to the browser, use a new tool: the packet sniffer.
This is a generic term for a range of applications that can listen to raw network
traffic, display it on the screen, log it, analyze it, and so on.What you’re interested
in is watching what your browser is doing in the background: what it is sending,
where it is sending it to, and then the replies it is getting.
My packet sniffer of choice for this job is Jeremy Elson’s Tcpflow, available at
www.circlemud.org/~jelson/software/tcpflow/.
I use Marc Liyanage’s OS X package, which you can download from
www.entropy.ch/software/macosx/#tcpflow.
Tcpflow is available under the GPL, and can be compiled on most proper computing
platforms.Windows users will need to look elsewhere, but the following
techniques remain the same.
Firing Up Tcpflow
Install Tcpflow, and set it running inside a terminal window, monitoring port 80.
On my machine, that means typing the following:
sudo tcpflow -c port 80
Then open a browser and request a page. Any will do: Figure 5-5 shows the start
of a typical result.
As you can see from the figure and your own screen, Tcpflow captures all of the
traffic flowing backward and forward across Port 80—all your web traffic, in
other words. It shows the requests and the answers: headers, content, and all.
Tcpflow is perfect for the job. But there’s a snag. Open up Gmail, and let it sit
there for a while. After it settles down, you will notice that Tcpflow regularly
burps up new traffic looking very similar to Listing 5-4. This is Gmail’s heartbeat:
checking for new mail. But it’s very odd looking.
Chapter 5 — How Gmail Works 63
FIGURE 5-5: The start of a Tcpflow session
Listing 5-4: Gmail Checking for New Mail
216.239.057.107.00080-192.168.016.050.59607: HTTP/1.1 200 OK
Set-Cookie: SID=AfzuOeCbwFixNvWd6vNt7bUR2DpPxRz-
YhOB54dzyYwHeLIHjVq_eeHH5s6MYQbPE0hVUK_LMROFuRWkMhfSR-U=;
Domain=.google.com;Path=/;Expires=Tue, 06-Jan-2015 00:12:12 GMT
Set-Cookie: GBE=; Expires=Fri, 07-Jan-05 00:12:12 GMT; Path=/
Cache-control: no-cache
Pragma: no-cache
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Transfer-Encoding: chunked
Server: GFE/1.3
Date: Sat, 08 Jan 2005 00:12:12 GMT
a
..........
216.239.057.107.00080-192.168.016.050.59607: 2c8
R...A{[uj...*..lQ...D.M.”.h...}...”G...RD..7../}.c...K
H$g.....U.........M-.J
4......Y.......&....M.(..=.b..t...t.M.*...S!.....dZ.r.........
..w..iy....RQ.T.....n.....n.*.sqK.0.e.Y.m..g...h....{.k[i.k...
..,d!....X..”...Y.a..v......;...J.f29.4....E...Q..,.gA.D.<....
l....r...n0X..z.]0...~g>o1.. x1,...U..f.VK....R++.6.
Continued
64 Part II — Getting Inside Gmail
Listing 5-4 (continued)
.YG......Q...Y......V.O...v
Oh7.D.M.X..3{%f.6].N...V*j.....+.J....2z@..n..)8..?Z./o....j*o
.........3..
!=*.a.v.s..........”\..i{.;o..nh....K+q.\||...G.3]....x.;h.].r
...+..U?,...c........s..PF.%!....i2...}..’+.zP._.
....M...a35u]9.........-A...2.].F|.=..eQK
..5k.qt.....Wt..@Wf{.y.I..
X..*;.D...<*.r.E>...?.uK9p...RC..c..C.~.<..<..0q..9..I.pg.>...
.
...x$..........
The headers are understandable enough, but the content is very strange indeed.
This is because your browser is taking advantage of Gzip encoding. Most modern
web servers can serve content encoded with the Gzip algorithm, and most modern
browsers are happy to decode it on the fly. Human brains, of course, cannot, so
you need to force Gmail to send whatever it is sending over unencoded.
In the first few chapters of this book, you’ve been using Firefox, so return to that
browser again now. In the address bar, type the URL about:config.
You should see a page looking like Figure 5-6.
FIGURE 5-6: The Firefox secret settings page
Chapter 5 — How Gmail Works 65
This page allows you to change the more fundamental browser settings. You need
to change only one. Scroll down to network.http.accept-encoding and click
on the string. By default it reads gzip/deflate. Just delete that, and leave it
blank, as shown in Figure 5-7.
FIGURE 5-7: The changed HTTP setting
Empty Firefox’s cache to prevent a strange bug, and restart the browser for good
measure. Now go back to Gmail and watch for the heartbeat. It will now look like
Listing 5-5.
Listing 5-5: Gmail’s Heartbeat, Unencoded
192.168.016.050.59622-216.239.057.107.00080: GET
/gmail?ik=344af70c5d&view=tl&search=inbox&start=0&tlt=1014fb79
f15&fp=54910421598b5190&auto=1&zx=24c4d6962ec6325a216123479
HTTP/1.1
Host: gmail.google.com
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O;
en-GB; rv:1.7.5) Gecko/20041110 Firefox/1.0
Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9
,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-gb,en;q=0.5
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer:
http://gmail.google.com/gmail?ik=344af70c5d&search=inbox&view=
tl&start=0&zx=24c4d6962ec6325a116384500
Cookie: GV=101014fb09ab5-af53c8c5457de50bec33d5d6436e82c6;
PREF=ID=2dfd9a4e4dba3a9f:CR=1:TM=1100698881:LM=1101753089:GM=1
:S=nJnfdWng4uY7FKfO; SID=AcwnzkuZa4aCDnqVeiG6-
pM487sZLlfXBz2JqrHFdjIueLIHjVq_eeHH5s6MYQbPE4wm3vinOWMnavqPWq3
SNNY=; GMAIL_AT=e6980e93d906d564-1014fb09ab7;
S=gmail=h7zPAJFLoyE:gmproxy=bnNkgpqwUAI; TZ=-60
216.239.057.107.00080-192.168.016.050.59622: HTTP/1.1 200 OK
Continued
66 Part II — Getting Inside Gmail
Listing 5-5 (continued)
Set-Cookie:
SID=AbF6fUKA6tCIrC8Hv0JZuL5cLPt3vlO6qonGit87BAlMeLIHjVq_eeHH5s
6MYQbPE-F6IjzxJjnWuwgSIxPn3GQ=;Domain=.google.com;Path=/
Cache-control: no-cache
Pragma: no-cache
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Server: GFE/1.3
Date: Sat, 08 Jan 2005 00:31:09 GMT
62
<script>var
loaded=true;</script><script>try{top.js.L(window,29,’18fd02c90
a
‘);}catch(e){}</script>
This you can recognize: The heartbeat had my browser requesting the following
URL:
/gmail?ik=344af70c5d&view=tl&search=inbox&start=0&tlt=1014fb79f15&
fp=54910421598b5190&auto=1&zx=24c4d6962ec6325a216123479
Likewise, the heartbeat had my browser passing the following cookie:
Cookie: GV=101014fb09ab5-af53c8c5457de50bec33d5d6436e82c6;
PREF=ID=2dfd9a4e4dba3a9f:CR=1:TM=1100698881:LM=1101753089:GM=1:S=n
JnfdWng4uY7FKfO; SID=AcwnzkuZa4aCDnqVeiG6-
pM487sZLlfXBz2JqrHFdjIueLIHjVq_eeHH5s6MYQbPE4wm3vinOWMnavqPWq3SNNY
=; GMAIL_AT=e6980e93d906d564-1014fb09ab7;
S=gmail=h7zPAJFLoyE:gmproxy=bnNkgpqwUAI; TZ=-60
The browser then received a new cookie:
SID=AbF6fUKA6tCIrC8Hv0JZuL5cLPt3vlO6qonGit87BAlMeLIHjVq_eeHH5s6MYQ
bPE-F6IjzxJjnWuwgSIxPn3GQ=;Domain=.google.com;Path=/
Along with the new cookie, my browser also received a snippet of JavaScript as
the contents of the page:
<script>var
loaded=true;</script><script>try{top.js.L(window,29,’18fd02c90a
‘);}catch(e){}</script>
What can you tell from all of this? Well, you now know how Gmail on your
browser communicates with the server, and you know how to listen in on the conversation.
Two things remain in this chapter, therefore: collecting as many of these
phrases as possible and then working out what they mean.
Chapter 5 — How Gmail Works 67
Prodding Gmail to Hear It Squeak
The technique to further learn Gmail’s secrets is obvious. Use it—sending mail,
receiving mail, and so on—and watch what it does in the background. From
these clues, and the JavaScript listing you already have, you can piece together a
complete picture of the Gmail server’s interface. And it’s that interface that you
ultimately want to deal with directly.
To get a clear idea of what is going on, you need to capture everything that happens
when Gmail is loaded, when it sits idle, and when you perform the common
actions with it.
Preparing to Watch the Gmail Boot Sequence
To start the process with gusto, open up Firefox again, and clear all of the caches.
Then open up a terminal window, and set Tcpflow running, and save its output to
a text file, like so:
sudo tcpflow -c ‘(port 80 or 443)’ >> login_capture.txt
This records everything that goes over HTTP or HTTPS. Then log in to Gmail
until you get to a nice, calm, idle Inbox like the placid Inbox shown in Figure 5-8.
FIGURE 5-8: A nice, calm Inbox at the end of the boot sequence
68 Part II — Getting Inside Gmail
You’ll be referring back to this figure in a page or two.
Now, stop the Tcpflow application with a judicious Control+c and open up the
login_capture.txt file.
Cleaning Up the Log
Before looking through the log properly, it needs to be cleaned up a bit. There’s a
lot of information that you don’t need. For instance, every request sent by my
browser has this code, which is superfluous to your needs:
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O;
en-GB; rv:1.7.5) Gecko/20041110 Firefox/1.0
Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9
,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-gb,en;q=0.5
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Search for this code and replace it with a single new line. Next, toward the end,
line 1862 in my working version is a whole collection of requests and responses
for image files. You’re not interested in these at all, so you can reduce them until
they look like so:
192.168.016.053.64150-216.239.057.106.00080: GET
/gmail/help/images/logo.gif 216.239.057.106.00080-
192.168.016.053.64150: HTTP/1.1 200 OK
This makes things much more readable. Now, between lines 394 and 1712 (more
or less, it may be slightly different in your log file) is the serving of the one enormous
JavaScript file. Strip the code out, and replace it with your own comment.
Finally, right at the beginning, are a few pages going backward and forward that
seem to be made of utter nonsense. These are encrypted. So, again, strip them out
and replace them with a comment.
You should now have around 500 lines of traffic between your browser and Gmail.
It’s time to step through it and see what is going on.To see the entire boot
sequence log, flip to Appendix A and look through Listing A-3.
Stepping Through the Gmail Boot Sequence
To be able to write an API, you need to know how the login works, so we shall start
there. In all of the following, my machine has the IP address 192.168.016.053.
Chapter 5 — How Gmail Works 69
Logging In
Start by requesting the page gmail.google.com.Whereupon,
Gmail replies back with an http 302 redirect to gmail.google.
com/?dest=http%3A%2F%2Fgmail.google.com%2Fgmail, which the browser
automatically follows, switching to encrypted traffic:
192.168.016.053.64142-216.239.057.106.00080: GET / HTTP/1.1
Host: gmail.google.com
216.239.057.106.00080-192.168.016.053.64142: HTTP/1.1 302
Moved Temporarily
Location:
https://gmail.google.com/?dest=http%3A%2F%2Fgmail.google.com%2
Fgmail
Cache-control: private
Content-Length: 0
Content-Type: text/html
Server: GFE/1.3
Date: Sun, 16 Jan 2005 17:11:18 GMT
192.168.016.053.64143-216.239.057.106.00443
LOTS OF ENCRYPTED TRAFFIC CLIPPED OUT FROM THIS SECTION
Because the login page is encrypted—the traffic flows over HTTPS not HTTP—
you can’t follow what it does using the log. You need to use a script to follow the
URLs until you get back to the trace. I used the following snippet of Perl code to
pretend to be a browser to see what is going on:
#!/usr/bin/perl -w
use LWP::UserAgent;
use HTTP::Request;
This Is Going to Break
During the writing of this book, the Gmail login sequence has changed at least three times. Not
massively so, it must be said, but enough to break code until I worked out just what had
changed. This section, and the chapters following, therefore, must be taken as guides to reverse
engineering the thing yourself, and not as a definitive reference to the Gmail login sequence. If
what I describe here no longer matches reality completely, I apologize. Take solace in the fact
that I have no idea what Google is up to either.
70 Part II — Getting Inside Gmail
use Crypt::SSLeay;
my $ua = LWP::UserAgent->new();
$ua -> agent(“Mozilla/4.0 (compatible; MSIE 6.0; Windows NT
5.1; .NET CLR 1.1.4322)”);
my $request = HTTP::Request->new(GET =>
‘https://gmail.google.com/’);
my $result = $ua->request($request);
if ($result->is_success) {
print $result->content;
} else {
print $result->status_line;
}
You can infer from actually doing it, or by using a script like the one above, that
the page continues with another redirect (or perhaps more than one), finally
ending up at www.google.com/accounts/ServiceLogin?
service=mail&continue=http%3A%2F%2Fgmail.google.com%2Fgmail,
as you can see in Figure 5-9.
FIGURE 5-9: The Gmail login screen
Chapter 5 — How Gmail Works 71
Viewing source on this page shows you two important things. First, there is the
username and password form itself and second some JavaScript that sets a cookie.
Deal with the form first. Listing 5-6 gives a cleaned-up version of the code, with
the styling removed.
Listing 5-6: The Gmail Login Form
<form action=”ServiceLoginAuth” id=”gaia_loginform”
method=”post”>
<input type=”hidden” name=”continue”
value=”http://gmail.google.com/gmail”>
<input type=”hidden” name=”service” value=”mail”>
Username: <input type=”text” name=”Email” value=”” size=”18”>
Password: <input type=”password” name=”Passwd”
autocomplete=”off” size=”18”>
<input type=”checkbox” name=”PersistentCookie” value=”yes”>
Don’t ask for my password for 2 weeks.
<input type=”submit” name=”null” value=”Sign in”>
</form>
From this we can see that the URL the page POSTs towards to log in is produced
as follows, split here for clarity.
https://www.google.com/accounts/ServiceLoginBoxAuth/continue=h
ttps://gmail.google.com/gmail
&service=mail
&Email=XXXXX
&Passwd=XXXXX
&PersistentCookie=yes
&null=Sign%20in
You will need this later on, but now, the cookie setting.
The First Cookie
The relevant sections of the JavaScript listing inside the login page appear in
Listing 5-7.
72 Part II — Getting Inside Gmail
Listing 5-7: Cookie-Setting Code from the Gmail Login
function SetGmailCookie(name, value) {
document.cookie = name + “=” + value +
“;path=/;domain=google.com”;
}
// This is called when the user logs in to gmail.
// We set a GMAIL_LOGIN2 cookie with the initial timings.
// The first letter “T” in the cookie value means that the
login is not
// completed yet. The main JS will complete logging the
timings and update
// the GMAIL_LOGIN2 cookie. See main.js
function lg() {
var now = (new Date()).getTime();
// use start_time as a place holder for login_box_time until
we’ve
// completely rolled out html-only login
var cookie = “T” + start_time + “/” + start_time + “/” +
now;
SetGmailCookie(“GMAIL_LOGIN2”, cookie);
}
var login_box_time;
function IframeOnLoad() {
if (!login_box_time) {
login_box_time = (new Date()).getTime();
}
}
function el(id) {
if (document.getElementById) {
return document.getElementById(id);
}
return null;
}
var ONE_PX = “https://gmail.google.com/gmail/images/c.gif?t=”
+
(new Date()).getTime();
function LogRoundtripTime() {
var img = new Image();
var start = (new Date()).getTime();
img.onload = GetRoundtripTimeFunction(start);
Chapter 5 — How Gmail Works 73
img.src = ONE_PX;
}
function GetRoundtripTimeFunction(start) {
return function() {
var end = (new Date()).getTime();
SetGmailCookie(“GMAIL_RTT2”, (end - start));
}
}
function OnLoad() {
var form = document.getElementById(“gaia_loginform”);
form.onsubmit = lg;
CheckBrowser();
LogRoundtripTime();
}
This JavaScript sets two cookies. The first, GMAIL_LOGIN2, is set with a value of
Tstart_time/start_time/now where both start_time and now are the datetime
exactly then. As you can see from the comments in the code, Google intends
to replace this in the future.
The second cookie is called GMAIL_RTT2 and contains the time it takes to retrieve
a 1-pixel image file from the Gmail servers. RTT, presumably, stands for Round
Trip Time.
You won’t look at it in this book, but the rest of the JavaScript code on that page
presents a very nice listing of a browser check that removes the login window if
the browser isn’t capable of using Gmail.
If you watch the Gmail login sequence from your own browser, you will see that it
goes through more redirects before it settles into HTTP again, and you can see
what is going on from the Tcpflow trace file.
Hitting stop on the browser at just the right time (and that is, to quote the fine
words of my editor, a total crapshoot), gives you this URL:
https://www.google.com/accounts/CheckCookie?continue=http%3A%2F
%2Fgmail.google.com%2Fgmail%3F_sgh%3D8a6d8ffbb159f1c7c9246bd4f4
9e78a1&service=mail&chtml=LoginDoneHtml
Viewing source on that page gives you Listing 5-8.
74 Part II — Getting Inside Gmail
Listing 5-8: The Gmail Cookie Check
<html>
<head>
<title>Redirecting</title>
<meta content=”0;
url=http://gmail.google.com/gmail?_sgh=8a6d8ffbb159f1c7c9246bd
4f49e78a1” http-equiv=”refresh”></head>
<body alink=”#ff0000” text=”#000000” vlink=”#551a8b”
link=”#0000cc” bgcolor=”#ffffff”>
<script type=”text/javascript” language=”javascript”><!--
location.replace(“http://gmail.google.com/gmail?_sgh=8a6d8ffbb
159f1c7c9246bd4f49e78a1”)
//--> </script>
</body>
</html>
This HTML forces you onto the next page, in this case gmail.google.
com/gmail?_sgh=8a6d8ffbb159f1c7c9246bd4f49e78a1.
You have seen this sort of URL before: Look back again at Listing A-3, after the
second excised block of encrypted code. So now you know that between the form
submission and the page you get in Listing 5-8, something else happens. You can
also guess that something happens to the cookie you set on the first page—it is
being checked for something. Considering that those cookies do not contain anything
but the time they were set, I am guessing that this step is to ensure that the
connection is current and not the result of caching from someone’s browser. It’s to
ensure a good, fresh session with Gmail on the part of the browser application and
the user himself. Or so I would guess.
Either way, the boot sequence continues from here automatically, with everything
in standard HTTP. You will see within the trace that the boot sequence loads the
Inbox next. So that’s what the next section considers.
Loading the Inbox
As you come to the end of the boot sequence you have nothing to do but load in the
Inbox and address book. This section deals specifically with the Inbox loading. The
output from the Tcpflow program earlier in this chapter doesn’t contain enough
mail to be of use in this regard, but if you do the trace again, only this time with a
few more messages in the Inbox, you can see what is going on. Figure 5-10 shows
the new Inbox, loaded with messages.
Chapter 5 — How Gmail Works 75
FIGURE 5-10: Gmail with some new, unread messages
Listing 5-9 shows the new trace.
A Summary of the Login Procedure
As I have said before, the login procedure for Gmail seems to be changing on a very regular
basis. Check with the libraries examined in Chapter 6 for the latest news on this. Basically, however,
the login procedure goes like this, with each step moving on only if the previous was
reported successful.
1. Request the Gmail page.
2. Set the two cookies.
3. Send the contents of the form.
4. Request the cookie check page.
5. Request the Inbox.
76 Part II — Getting Inside Gmail
Listing 5-9: The Inbox with More Messages Within
192.168.016.051.59905-064.233.171.107.00080: GET
/gmail?ik=&search=inbox&view=tl&start=0&init=1&zx=vzmurwe44cpx
6l HTTP/1.1
Host: gmail.google.com
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O;
en-GB; rv:1.7.5) Gecko/20041110 Firefox/1.0
Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9
,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-gb,en;q=0.5
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: gmail.google.com/gmail/html/hist2.html
Cookie: GV=1010186d43b2b-b6b21a87a46b00d1bc5abf1a97357dd7;
PREF=ID=0070250e68e17190:CR=1:TM=1106068639:LM=1106068639:S=O1
Nivj_xqk7kvdGK;
GMAIL_LOGIN=T1106068635841/1106068635841/1106068648645;
SID=DQAAAGoAAAC06FIY2Ix4DJlCk7ceaOnWPvpK4eWn9oV6xpmOT4sNhdBPkZ
2npQE8Vi8mWY9RybWVwJet9CHeRBw99oUdRqQHvBb8IWxhLcurTBFZJstXoUbW
FDZTmxZKt55eUxnspTHLanel9LsAU1wqHcHhlHI7;
GMAIL_AT=5282720a551b82df-10186d43b2e;
S=gmail=WczKrZ6s5sc:gmproxy=UMnFEH_hYC8; TZ=-60
064.233.171.107.00080-192.168.016.051.59905: HTTP/1.1 200 OK
Set-Cookie:
SID=DQAAAGoAAAC06FIY2Ix4DJlCk7ceaOnWPvpK4eWn9oV6xpmOT4sNhdBPkZ
2npQE8Vi8mWY9RybWVwJet9CHeRBw99oUdRqQHvBb8IWxhLcurTBFZJstXoUbW
FDZTmxZKt55eUxnspTHLanel9LsAU1wqHcHhlHI7;Domain=.google.com;Pa
th=/
Cache-control: no-cache
Pragma: no-cache
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Server: GFE/1.3
Date: Tue, 18 Jan 2005 17:17:36 GMT
936
<html><head><meta content=”text/html; charset=UTF-8” httpequiv=”
content-type”></head><script>D=(top.js&&top.js.init)?fu
nction(d){top.js.P(window,d)}:function(){};if(window==top){top
.location=”/gmail?ik=&search=inbox&view=tl&start=0&init=1&zx=v
zmurwe44cpx6l&fs=1”;}</script><script><!--
D([“v”,”15b3e78585d3c7bb”,”33fc762357568758”]
);
D([“ud”,”ben.hammersley@gmail.com”,”{\”o\”:\”OPEN\”,\”/\”:\”SE
ARCH\”,\”\\r\”:\”OPEN\”,\”k\”:\”PREV\”,\”r\”:\”REPLY\”,\”c\”:\
Chapter 5 — How Gmail Works 77
”COMPOSE\”,\”gc\”:\”GO_CONTACTS\”,\”gd\”:\”GO_DRAFTS\”,\”p\”:\
”PREVMSG\”,\”gi\”:\”GO_INBOX\”,\”m\”:\”IGNORE\”,\”a\”:\”REPLYA
LL\”,\”!\”:\”SPAM\”,\”f\”:\”FORWARD\”,\”u\”:\”BACK\”,\”ga\”:\”
GO_ALL\”,\”j\”:\”NEXT\”,\”y\”:\”REMOVE\”,\”n\”:\”NEXTMSG\”,\”g
s\”:\”GO_STARRED\”,\”x\”:\”SELECT\”,\”s\”:\”STAR\”}”,”344af70c
5d”,”/gmail?view=page&name=contacts&ver=50c1485d48db7207”]
);
D([“su”,”33fc762357568758”,[“l”,”/gmail/help/images/logo.gif”,
”i”,”Invite a friend to Gmail”,”j”,”Invite PH_NUM friends to
Gmail”]
]
);
D([“p”,[“bx_hs”,”1”]
,[“bx_show0”,”1”]
,[“bx_sc”,”0
064.233.171.107.00080-192.168.016.051.59905: “]
,[“bx_pe”,”1”]
,[“bx_ns”,”1”]
]
);
D([“ppd”,0]
);
D([“i”,6]
);
D([“qu”,”1 MB”,”1000 MB”,”0%”,”#006633”]
);
D([“ft”,”Search accurately with <a style=color:#0000CC
target=_blank
href=\”/support/bin/answer.py?ctx=gmail&answer=7190\”>operator
s</a> including <b>from:</b> &nbsp;<b>to:</b>
&nbsp;<b>subject:</b>.”]
);
D([“ds”,2,0,0,0,0,16,0]
);
D([“ct”,[[“Heads”,0]
,[“Knees”,0]
,[“Shoulders”,0]
,[“Toes”,0]
]
]
);
D([“ts”,0,50,3,0,”Inbox”,”10186d450f9”,3,]
);
//--></script><script><!--
D([“t”,[“101865c04ac2427f”,1,0,”<b>4:06pm</b>”,”<span
id=\’_user_ben@benhammersley.com\’><b>Ben
Hammersley</b></span>”,”<b>&raquo;</b>&nbsp;”,”<b>This is the
third message</b>”,,[]
Continued
78 Part II — Getting Inside Gmail
Listing 5-9 (continued)
,””,”101865c04ac2427f”,0,”Tue Jan 18 2005_7:06AM”]
,[“101865b95fc7a35a”,1,0,”<b>4:05pm</b>”,”<span
id=\’_user_ben@benhammersley.com\’><b>Ben
Hammersley</b></span>”,”<b>&raquo;</b>&nbsp;”,”<b>This is the
second message</b>”,,[]
,””,”101865b95fc7a35a”,0,”Tue Jan 18 2005_7:05AM”]
,[“101480d8ef5dc74a”,0,1,”Jan 6”,”<span
id=\’_user_ben@benhammersley.com\’>Ben
Hammersley</span>”,”<b>&raquo;</b>&nbsp;”,”Here\’s a nice
message.”,,[“^t”,”Heads”]
,””,”101480d8ef5dc74a”,0,”Thu Jan 6 2005_4:44AM”]
]
);
D([“te”]);
//--></script><script>var
fp=’341d292f3e55766f’;</script><script>var
loaded=true;D([‘e’]);</script><script>try{top.js.L(window,45,’
cb803471f1’);}catch(e){}</script>
What to make of these traces? First, you can see that to call the contents of the
Inbox, the browser requests two URLs. First, this one:
/gmail?ik=&search=inbox&view=tl&start=0&init=1&zx=z6te3fe41hmsjo
And next, this one:
/gmail?ik=&search=inbox&view=tl&start=0&init=1&zx=781ttme448dfs9
And second, it appears that the real workings of the Inbox are contained in the
JavaScript function that starts D([“t”]), as Listings 5-10 and 5-11 show.
Listing 5-10: With One Message
D([“t”,[“101480d8ef5dc74a”,0,0,”Jan 6”,”<span
id=\’_user_ben@benhammersley.com\’>Ben
Hammersley</span>”,”<b>&raquo;</b>&nbsp;”,”Here\’s a nice
message.”,,[]
,””,”101480d8ef5dc74a”,0,”Thu Jan 6 2005_4:44AM”]
]
);
Chapter 5 — How Gmail Works 79
Listing 5-11: With Three Messages
D([“t”,[“101865c04ac2427f”,1,0,”<b>4:06pm</b>”,”<span
id=\’_user_ben@benhammersley.com\’><b>Ben
Hammersley</b></span>”,”<b>&raquo;</b>&nbsp;”,”<b>This is the
third message</b>”,,[]
,””,”101865c04ac2427f”,0,”Tue Jan 18 2005_7:06AM”]
,[“101865b95fc7a35a”,1,0,”<b>4:05pm</b>”,”<span
id=\’_user_ben@benhammersley.com\’><b>Ben
Hammersley</b></span>”,”<b>&raquo;</b>&nbsp;”,”<b>This is the
second message</b>”,,[]
,””,”101865b95fc7a35a”,0,”Tue Jan 18 2005_7:05AM”]
,[“101480d8ef5dc74a”,0,1,”Jan 6”,”<span
id=\’_user_ben@benhammersley.com\’>Ben
Hammersley</span>”,”<b>&raquo;</b>&nbsp;”,”Here\’s a nice
message.”,,[“^t”,”Heads”]
,””,”101480d8ef5dc74a”,0,”Thu Jan 6 2005_4:44AM”]
]
);
From looking at these listings, you can deduce that the Inbox structure consists of
one or more of the following arrays (I’ve added in line breaks for clarity):
[
“101480d8ef5dc74a”,
0,
0,
“Jan 6”,
“<span id=\’_user_ben@benhammersley.com\’>Ben
Hammersley</span>”,
“<b>&raquo;</b>&nbsp;”,
“Here\’s a nice message.”,
,[]
,””
,”101480d8ef5dc74a”
,0
,”Thu Jan 6 2005_4:44AM”
]
From further deduction, where I sent different types of e-mail to Gmail and
watched what it did—I’ll omit all of that here for the sake of brevity, but you
should have the idea—you can see that the array consists of the following:
[
“101480d8ef5dc74a”, -> The message id.
0, -> Unread=1, Read=0
0, -> Starred=1, plain=0
80 Part II — Getting Inside Gmail
“Jan 6”, -> The date displayed
“<span id=\’_user_ben@benhammersley.com\’>Ben
Hammersley</span>”, -> Who sent it
“<b>&raquo;</b>&nbsp;”, -> The little icon in the inbox
“Here\’s a nice message.”, -> The subject line
,[] -> Labels
,”” -> Attachments
,”101480d8ef5dc74a” -> The message ID
,0 -> Unknown
,”Thu Jan 6 2005_4:44AM” -> The full date and time
]
You now know how to decode the Gmail mail listing. You can also see how to
request this data structure—by calling the URL, and parsing the returned
JavaScript function. You can do this in simple regular expressions, a topic explored
in Chapter 7.
Storage Space
The detail of the mail in the Inbox isn’t the only information sent when you
request that URL. Look above the mail function and you can see the following:
D([“qu”,”1 MB”,”1000 MB”,”0%”,”#006633”]
This line of data sent from Gmail’s servers clearly corresponds to the display at
the bottom of the screen giving your mailbox usage statistics:
 D([“qu”,: The name of the Gmail function that deals with the usage
information.
 “1 MB”,: The amount of storage used.
 “1000 MB”,: The maximum amount available.
 “0%”,: The percentage used.
 “#006633”: The hex value for a nice shade of green.
Labels
In Figure 5-10 I have added some labels to the Gmail system. Spotting them in
the Tcpflow is easy:
D([“ct”,[[“Heads”,0],[“Knees”,0],[“Shoulders”,0],[“Toes”,0]]]);
You can deduce straight away that the function starting with D([“ct” contains
the names and an unknown value (perhaps it’s a Boolean, perhaps it’s a string, you
don’t know as yet) of the Labels. You can more easily harvest this data when you
come to write your own API.
Chapter 5 — How Gmail Works 81
Reading an Individual Mail
Fire up Tcpflow again, and click one of the messages in the Inbox in Figure 5-10.
The trace resulting from this action is shown in Listing 5-12.
Listing 5-12: Trace from Reading a Message
192.168.016.051.59936-064.233.171.105.00080: GET
/gmail?ik=344af70c5d&view=cv&search=inbox&th=101865c04ac2427f&
lvp=-1&cvp=0&zx=9m4966e44e98uu HTTP/1.1
Host: gmail.google.com
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O;
en-GB; rv:1.7.5) Gecko/20041110 Firefox/1.0
Accept:text/xml,application/xml,application/xhtml+xml,text/htm
l;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-gb,en;q=0.5
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer:
http://gmail.google.com/gmail?ik=&search=inbox&view=tl&start=0
&init=1&zx=iv37tme44d1tx5
Cookie: GV=1010186dcc455-ce01891ce232fa09b7f9bcfb46adf4e7;
PREF=ID=0070250e68e17190:CR=1:TM=1106068639:LM=1106068659:GM=1
:S=3jNiVz8ZpaPf0GW0; S=gmail=WczKrZ6s5sc:gmproxy=UMnFEH_hYC8;
TZ=-60; SID=DQAAAGoAAACm_kF5GqnusK0rbFcAlLKoJUx26l6np-
H5Een1P_hN--yWqycLWSJUZt3G9Td_Cgw_ZK1naS891aWxZ6IkbNiBFN1J4lmO
COTvOn7r3bnYjWlOqB6netb06ByuEf56Cd12ilfgika0MxmuamO3FWzw;
GMAIL_AT=29a3f526e2461d87-10186dcc456; GBE=d-540-800
064.233.171.105.00080-192.168.016.051.59936: HTTP/1.1 200 OK
Set-Cookie: SID=DQAAAGoAAACm_kF5GqnusK0rbFcAlLKoJUx26l6np-
H5Een1P_hN--yWqycLWSJUZt3G9Td_Cgw_ZK1naS891aWxZ6IkbNiBFN1J4lmO
COTvOn7r3bnYjWlOqB6netb06ByuEf56Cd12ilfgika0MxmuamO3FWzw;Domai
n=.google.com;Path=/
Set-Cookie: GBE=; Expires=Mon, 17-Jan-05 18:00:37 GMT; Path=/
Cache-control: no-cache
Pragma: no-cache
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Server: GFE/1.3
Continued
82 Part II — Getting Inside Gmail
Listing 5-12 (continued)
Date: Tue, 18 Jan 2005 18:00:37 GMT
4d5
<html><head><meta content=”text/html; charset=UTF-8” httpequiv=”
content-type”></head><script>D=(top.js&&top.js.init)?fu
nction(d){top.js.P(window,d)}:function(){};if(window==top){top
.location=”/gmail?ik=344af70c5d&view=cv&search=inbox&th=101865
c04ac2427f&lvp=-
1&cvp=0&zx=9m4966e44e98uu&fs=1”;}</script><script><!--
D([“v”,”15b3e78585d3c7bb”,”33fc762357568758”]
);
D([“i”,6]
);
D([“qu”,”1 MB”,”1000 MB”,”0%”,”#006633”]
);
D([“ft”,”Compose a message in a new window by pressing
\”Shift\” while clicking Compose Mail or Reply.”]
);
D([“ds”,1,0,0,0,0,16,0]
);
D([“ct”,[[“Heads”,0]
,[“Knees”,0]
,[“Shoulders”,0]
,[“Toes”,0]
]
]
);
D([“cs”,”101865c04ac2427f”,”This is the third message”,”This
is the third message”,””,[“^i”]
,[]
,0,1,”h3ttlgu1hqiz9324trq5kp5qo7wa96s”,,”101865c04ac2427f”]
);
D([“mi”,0,1,”101865c04ac2427f”,0,”0”,”Ben
Hammersley”,”ben@benhammersley.com”,”me”,”4:05pm (2&frac12;
hours ago)”,[“Ben Hammersley <ben.hammersley@gmail.com>”]
,[]
,[]
,[
064.233.171.105.00080-192.168.016.051.59936: ]
,”Tue, 18 Jan 2005 16:05:17 +0100”,”This is the third
message”,””,[]
,1,,,”Tue Jan 18 2005_7:05AM”]
Chapter 5 — How Gmail Works 83
);
D([“mb”,”3rd! THREE! THIRD!<br><br>”,0]
);
D([“ce”]);
//--></script><script>var
loaded=true;D([‘e’]);</script><script>try{top.js.L(window,70,’
1
ab915da64’);}catch(e){}</script>
First thing first: the URL. Requesting this message caused Gmail to load this
URL:
/gmail?ik=344af70c5d&view=cv&search=inbox&th=101865c04ac2427f&l
vp=-1&cvp=0&zx=9m4966e44e98uu.
Or, to put it more understandably:
/gmail?
ik=344af70c5d
&view=cv
&search=inbox
&th=101865c04ac2427f
&lvp=-1
&cvp=0
&zx=9m4966e44e98uu
As you can see, th is the message ID of the message I clicked on. But the others
are mysterious at the moment.
At this point in the proceedings, alarms went off in my head.Why, I was thinking,
is the variable for message ID th—when that probably stands for thread. So,
I sent a few mails back and forth to create a thread, and loaded the Inbox and the
message back up again under Tcpflow. Listing 5-13 shows the resulting trace. It is
illuminating.
Listing 5-13: Retrieving a Thread, Not a Message
THE INBOX LOADING:
D([“t”,[“10187696869432e6”,1,0,”<b>9:00pm</b>”,”<span
id=\’_user_ben@benhammersley.com\’>Ben</span>, <span
id=\’_user_ben.hammersley@gmail.com\’>me</span>, <span
id=\’_user_ben@benhammersley.com\’><b>Ben</b></span>
(3)”,”<b>&raquo;</b>&nbsp;”,”<b>This is the third
message</b>”,,[]
Continued
84 Part II — Getting Inside Gmail
Listing 5-13 (continued)
,””,”10187696869432e6”,0,”Tue Jan 18 2005_12:00PM”]
,[“101865b95fc7a35a”,1,0,”<b>4:05pm</b>”,”<span
id=\’_user_ben@benhammersley.com\’><b>Ben
Hammersley</b></span>”,”<b>&raquo;</b>&nbsp;”,”<b>This is the
second message</b>”,,[]
,””,”101865b95fc7a35a”,0,”Tue Jan 18 2005_7:05AM”]
,[“101480d8ef5dc74a”,0,1,”Jan 6”,”<span
id=\’_user_ben@benhammersley.com\’>Ben
Hammersley</span>”,”<b>&raquo;</b>&nbsp;”,”Here\’s a nice
message.”,,[“^t”,”Heads”]
,””,”101480d8ef5dc74a”,0,”Thu Jan 6 2005_4:44AM”]
]
);
D([“te”]);
THE GETTING MESSAGE EXCHANGE
192.168.016.051.61753-216.239.057.105.00080: GET
/gmail?ik=344af70c5d&view=cv&search=inbox&th=10187696869432e6&
lvp=-1&cvp=0&zx=24lfl9e44iyx7g HTTP/1.1
Host: gmail.google.com
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O;
en-GB; rv:1.7.5) Gecko/20041110 Firefox/1.0
Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9
,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-gb,en;q=0.5
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer:
http://gmail.google.com/gmail?ik=&search=inbox&view=tl&start=0
&init=1&zx=cs149e44iu4pd
Cookie: GV=101018770f6a0-36b4c5fcaa4913584af2219efa21740e;
SID=DQAAAGoAAACTZryXzUYHgTI4VWtHGXDY5J8vchRrqp_Ek4XjEgdZYQwBUE
Chapter 5 — How Gmail Works 85
pXOuyokCt-EOOmsaL8J8_bQ3jkrMfskffoH8Mb6GvEJJPAhS6noKP8IjnREcWN8MTvIPeqOYYoxE52oLva00EWdOrsGhtCy18RphU;
GMAIL_AT=aa5dcfedda2d8658-1018770f6a2; S=gmail=pl14BJCt_
4:gmproxy=c9z4V0uxx2o; TZ=-60; GMAIL_SU=1;
PREF=ID=e38a980ef675b953:TM=1106078936:LM=1106078936:GM=1:S=T0
D_V1EFUHr7faSw; GBE=d-540-800
216.239.057.105.00080-192.168.016.051.61753: HTTP/1.1 200 OK
Set-Cookie:
SID=DQAAAGoAAACTZryXzUYHgTI4VWtHGXDY5J8vchRrqp_Ek4XjEgdZYQwBUE
pXOuyokCt-EOOmsaL8J8_bQ3jkrMfskffoH8Mb6GvEJJPAhS6noKP8IjnREcWN8MTvIPeqOYYoxE52oLva00EWdOrsGhtCy18RphU;
Domain=.google.com
;Path=/
Set-Cookie: GBE=; Expires=Mon, 17-Jan-05 20:12:34 GMT; Path=/
Set-Cookie: GMAIL_SU=; Expires=Mon, 17-Jan-05 20:12:34 GMT;
Path=/
Cache-control: no-cache
Pragma: no-cache
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Server: GFE/1.3
Date: Tue, 18 Jan 2005 20:12:34 GMT
b23
<html><head><meta content=”text/html; charset=UTF-8” httpequiv=”
content-type”></head><script>D=(top.js&&top.js.init)?fu
nction(d){top.js.P(window,d)}:function(){};if(window==top){top
.location=”/gmail?ik=344af70c5d&view=cv&search=inbox&th=101876
96869432e6&lvp=-
1&cvp=0&zx=24lfl9e44iyx7g&fs=1”;}</script><script><!--
D([“su”,”33fc762357568758”,[“l”,”/gmail/help/images/logo.gif”,
”i”,”Invite a friend to Gmail”,”j”,”Invite PH_NUM friends to
Gmail”]
Continued
86 Part II — Getting Inside Gmail
Listing 5-13 (continued)
]
);
D([“v”,”15b3e78585d3c7bb”,”33fc762357568758”]
);
D([“i”,6]
);
D([“qu”,”1 MB”,”1000 MB”,”0%”,”#006633”]
);
D([“ft”,”Automatically <span style=\”color:#0000CC;textdecoration:
underline;cursor:pointer;cursor:hand;white-space:no
wrap\” id=\”prf_d\”><b>forward</b></span> your Gmail messages
to another email account. &nbsp; <a style=color:#0000CC
target=_blank
href=\”/support/bin/answer.py?ctx=gmail&answer=10957\”>Learn&n
bsp;more</a>”]
);
D
216.239.057.105.00080-192.168.016.051.61753:
([“ds”,1,0,0,0,0,16,0]
);
D([“ct”,[[“Heads”,0]
,[“Knees”,0]
,[“Shoulders”,0]
,[“Toes”,0]
]
]
);
D([“cs”,”10187696869432e6”,”This is the third message”,”This
is the third message”,””,[“^i”]
,[]
,0,3,”g6yz3b2a3jhoga7fql7qx3yo6l9gvyf”,,”10187696869432e6”]
);
D([“mi”,2,1,”101865c04ac2427f”,0,”0”,”Ben
Hammersley”,”ben@benhammersley.com”,”me”,”4:05pm (5 hours
ago)”,[“Ben Hammersley <ben.hammersley@gmail.com>”]
,[]
,[]
,[]
,”Tue, 18 Jan 2005 16:05:17 +0100”,”This is the third
message”,”3rd! THREE! THIRD!”,[]
,1,,,”Tue Jan 18 2005_7:05AM”]
);
//--></script><script><!--
Chapter 5 — How Gmail Works 87
D([“mi”,2,2,”101876847addcbd1”,0,”0”,”Ben
Hammersley”,”ben.hammersley@gmail.com”,”Ben”,”8:59pm (13
minutes ago)”,[“Ben Hammersley <ben@benhammersley.com>”]
,[]
,[]
,[“Ben Hammersley <ben.hammersley@gmail.com>”]
,”Tue, 18 Jan 2005 20:59:13 +0100”,”Re: This is the third
message”,”And this is a reply back On Tue, 18 Jan 2005
16:05:17 +0100, Ben Hammersley &lt;...”,[]
,1,,,”Tue Jan 18 2005_11:59AM”]
);
D([“mi”,0,3,”10187696869432e6”,0,”0”,”Ben
Hammersley”,”ben@benhammersley.com”,”me”,”8:59pm (12 minutes
ago)”,[“Ben Hammersley <ben.hammersley@gmail.com>”]
,[]
,[]
,[]
,”Tue, 18 Jan 2005 20:59:40 +0100”,”Re: This is the third
message”,””,[]
,1,,,”Tue Jan 18 2005_11:59AM”]
);
D([“mb”,”And this is another reply back yet again<br>”,1]
);
D([“mb”,”<div><div class=ea><span id=e_10187696869432e6_1>-
Show quoted text -</span></div><span class=e
216.239.057.105.00080-192.168.016.051.61753:
id=q_10187696869432e6_1><br>On 18 Jan 2005, at 20:59, Ben
Hammersley wrote:<br><br>&gt; And this is a reply
back<br>&gt;<br>&gt;<br>&gt; On Tue, 18 Jan 2005 16:05:17
+0100, Ben Hammersley<br>&gt; &lt;<a onclick=\”return
top.js.OpenExtLink(window,event,this)\”
href=\”mailto:ben@benhammersley.com\”>ben@benhammersley.com</a
>&gt; wrote:<br>&gt;&gt; 3rd! THREE!
THIRD!<br>&gt;&gt;<br>&gt;&gt;<br><br></span></div>”,0]
);
D([“ce”]);
//--></script><script>var
loaded=true;D([‘e’]);</script><script>try{top.js.L(window,32,’
9
36bba732b’);}catch(e){}</script>
As you can deduce, th does indeed stand for thread. In Gmail, it turns out, you do
not just retrieve single messages. Rather, you retrieve the requested message and
also the entire set of headers for the rest of the messages in the thread. You can see
88 Part II — Getting Inside Gmail
this quite clearly in the example above. The lines in bold type show the headers
for all three messages, and the whole thing finishes with the entire content of the
requested message.
You then allow the JavaScript code to wrangle the interface afterward. This is a
clever trick: it allows the interface to be very quick at the point the user wants it to
be—when you’re reading through a thread—instead of loading each message
individually.
So, you now know how to ret



About the author

160