Thursday, January 24, 2013

My GCIA Gold paper was published on sans.org. Also, SANS watermarks

I completed my GCIA Gold paper and it was accepted and published this past week.  Link below:

http://www.sans.org/reading_room/whitepapers/detection/watermarks-prevent-leaks_34087

I wrote it on watermarks and it took me the larger half of the past year to complete.  Boy I really underestimated how difficult it would be. When I read other people's papers I saw a lot of poor grammar and spelling so I didn't take it very seriously and figured I could just knock it out in a leisurely few months.  I couldn't be more wrong.  I spent well over a hundred hours on it, and almost all of the original draft got scrapped.  If you're considering a "gold" level paper for any of the GIAC certifications, be aware it isn't going to be easy.  But when you get it, it's good resume fodder.

I picked the topic of the paper due to the watermarks I noticed on the GCIA practice tests I was taking.  Below are details about SANS watermarks, simply because they were dropped from the paper.  The information was passed along to SANS many months ago so they can act on it if they wanted.  None of this could be exploited in a profitable way, but I found it interesting.

If you look at the monitor at an angle while taking a SANS practice test you can see the watermarks.  On further inspection you may notice it's the exact same number as the exam ID.  You might also notice that the exam ID numbers you get if you purchase multiple tests are sequential.  I collected dozens of numbers from other users and found that this incrementing number is sequential for all users.  You could spawn practice "math tests" to ensure the software worked, and those numbers were also sequential from the same autoincrementing field in their database.  This indicates an inference attack is possible.

Now the purpose of the watermark is most undoubtedly to trace the source of copyright infringement.  The number is unique on a per-test basis and if the test questions were found on p2p networks and the watermark number was carried over, it could be used as evidence against the student that number pointed to.  Since SANS testing costs as about much as a used car, they probably care more about piracy than the average cert vendor.

The problem is that the watermark number is semi-predictable and easily altered.  It isn't apparent that your exam ID is sensitive information and could be used against you as evidence, and I didn't realize it myself when I was asking people for theirs.  If you wanted to frame someone for piracy, it's as simple as asking them for their practice exam ID.

Since the IDs are sequential, you could also spawn many math tests and infer someone else's exam ID numbers when they purchase one.  example:

Exam ID - description
1 - my math test
2 - my math test
3 - my math test
4 - Other person's practice test
5 - Other person's practice test
6 - Other person's exam
7 - my math test
8 - my math test

On my account I would notice a gap between 4 and 7 in my math test IDs and I would know that someone purchased a set of three tests(which is how it goes when they buy the full training, afaik) and #4 and #5 are the practice tests.  Then if I was an awful person I could insert a "4" watermark and release the test question into the wild.  This is a very hit or miss way of framing someone because their practice test is most likely for a different certification than your test question and you can't tell.

Aside from that, one could also infer purchasing statistics from this autoincrement field.  By spawning math tests over time, one could infer how many tests are purchased by observing how many exam IDs were generated by other people.  clusters of 3 are most likely full training purchases, and 1 indicates a single test purchase or a math test.  The math test function is rather obscure on their website and I think it's fair to say that it's rarely used.  SANS doesn't release any information(afaik) about how many tests or trainings are purchased- only how many students pass.  This information isn't very profitable but it is not public either.

Now with all of that said, the real core of this weakness is that they're using their autoincrement field for something important.  It's a convenient field to use because you know it'll be unique and it's always going to be an index and easily searchable.  But it will always be weak to inference attacks, and that class of attacks is very hard to protect against because most of the time it isn't obvious what you could discover with that data.

This attack was inspired by an inference attack I found/reported against a video game years ago.  The account ID was autoincremented and easily obtainable so I could infer secret data like account age and some overall userbase statistics.  In the context of that game it was useful for cheating, so beware of using that field.

And if you ever plan to do evil stuff with this knowledge, this is all i have to say to you.  Spawning tons of math tests on sans exam portal will get you noticed.

No comments:

Post a Comment