Retreiving data you posted in a browser
Scenario
The other day, I posted a message on a Cornell Yahoo Group about EnergyTeachers.org. The moderator wrote me back asking me to repost the message with my name and alumnus info, their policy being not to post without that info. I wanted to re-post the exact same phrases, but hadn't saved the text. Yahoo Groups offers no way to access my posts if they're not accepted.
Process
I thought my web browser, Safari, might have cached the text. I went to the posting form, typed the letter G in the Subject field, and Safari filled in the rest. So, if it remembered that, there's a chance it would remember the text I put in the other field.
Going to the next field, I tried typing the first letter of the message, but I didn't remember what it was. It turns out that even if I did, Auto Fill doesn't automatically fill large text areas, just briefer parts of a form. So, there's no guarantee that Safari remembers what I typed at all, but I wanted to hunt for it. I learned that Safari keeps a cache of recent information in a file at ~/Library/Caches/com.apple.Safari/Cache.db . I opened that file in HexEdit, and the header told me in plain english that it was a SQLite format database.
I opened Terminal (Mac OS terminal program) and then opened the file in SQLite with this command:
sqlite3 ~/Library/Caches/com.apple.Safari/Cache.db
Inside sqlite3, I found out what tables there were in this db:
sqlite> .tables cfurl_cache_blob_data cfurl_cache_schema_version cfurl_cache_response
the first table has all sorts of information about each page visit in Safari. The last table has date info that might be useful. The first field in each of those two tables has an event ID that helps correlate the information. If you turn on headers, you can see what each field is. Get one row just to see the headers with an example row:
sqlite> .headers ON sqlite> SELECT * FROM cfurl_cache_blob_data LIMIT 1;
You'll see that fields are separated by a pipe (|). So I searched for rows that had a unique part of the URL in the field called request_object and was a POST, not a GET, POST being the way data-rich forms are sent:
sqlite> SELECT request_object FROM cfurl_cache_blob_data WHERE request_object LIKE '%>POST<%' AND request_object LIKE '%big_red_bulletin%';
Inside that row of data, Safari keeps a tantalizing bit of information in XML:
cmVmZXJlcj0lMkZncm91cCUyRmJpZ19yZWRfYnVsbGV0aW4lMkYmbWVzc2Fn ZU51bT0wJnljYj1mJTJGa2Ewa2p5a1JmJmZyb209JnN1YmplY3Q9R3JlZW4r ...(leaving out several lines) dGlvbi4lMEQlMEFodHRwJTNBJTJGJTJGZW5lcmd5dGVhY2hlcnMub3JnJTJG Z3JlZW5kb2xsaG91c2VjaGFsbGVuZ2UucGhwJmxhbmc9MDAmc2VuZD1TZW5k
That looks like data compressed for transmission, not unlike how images are compressed into text characters for transmission through e-mail. Here's where my process got sticky. I had no idea how to decompress it, then I saw these lines nearby in the request:
<key>Accept-Encoding</key> <string>gzip, deflate</string> <key>Content-Type</key> <string>application/x-www-form-urlencoded</string>
I thought maybe I could save the data in a file, give it a gz file extension, and my Mac would automatically try to decompress it when I double clicked the file in the finder. Well, that didn't work, nor did trying gzip in the command line. From the words 'Accept-Encoding' I realized that Safari was just telling the server that it was happy to receive compressed info, not relevant to this project.
I looked through dozens of pages on the internet trying to find out how form data were encoded for transmission. Finally, I was looking at a PHP.net page about base64 unencoding and noticed an example encoded text looked like what I had, so I used PHP in the Terminal to decode it, lickety-split:
php -r 'echo base64_decode("cmVmZX...1TZW5k");'
- About the above command
- php is the command line interpreter, and it must be enabled on your computer.
- -r means execute the following command.
- echo means output the following text, in this case back to the terminal so you can read it.
- notice I use single quotes for the entire command, so I can use double-quotes for the data to be decoded and the interpreter won't get confused.
The resulting text was URL-encoded, something like this:
EnergyTeachers.org%2C+a+nonprofit+started+by+a+
So I replaced the funny parts with their original equivalents. For example, a plus sign means a space, and %2C means a comma. But there's also a function in PHP to decode that, so now I realize that I can get the original post this way:
php -r 'echo urldecode(base64_decode("cmVmZX...1TZW5k"));'
PHP even has a sqlite interpreter, so I could automate the whole process if I like, but I'm happy that I got back my lost post.
Similar work is done by the program File Juicer. It goes through Safari's cache and retrieves files and images there, and also many other Mac apps' caches.
References
Wikipedia article on Base64 http://en.wikipedia.org/wiki/Base64
PHP encoding functions http://www.php.net/manual/en/ref.url.php