News/eResearch/Cryptography for CloudStor – encryption and timestamping news
CloudStor update from AARNet’s Director eResearch Guido Aben
Short story for the time deficient
We’re adding two features to our CloudStor file transfer service: timestamping (the curious can demo a barebones version here ) and end-to-end encryption, which we’ve got working in code that’s fit to be product-ised; a public beta for the encryption feature is expected Q3 2014.
Longer story for the merely time poor
For the past few months, we’ve been busy adding features to CloudStor that fall in the category of ‘cryptography’; and because we realise this is a subject both important (at least to us it is!) as well as confusing, we thought we’d offer a few words of explanation here for what we’re doing and why we think this may be relevant for the way you transfer big files.
First of all, why are we doing this?
When we launched CloudStor, the service provided a subset of researchers with the first practical means they’d ever had to share their original (often very large) research products – full datasets, or preprint articles with full-res pictures and still-editable figures in them (huge!).
New methods engender new realisations, and in this case the realisation often was that digital copies of original research are indeed completely indistinguishable from the original work. Not an earth-shattering observation perhaps, but one that acquires stark relevance when one is suddenly put in that position.
Consequently, we started receiving requests for more control around the sharing mechanism, and on closer inspection these requests fell into two broad categories:
a) assertion of origin
In other words, a desire to be able to assert with authority that a particular dataset, or more generally a particular file, was in the possession of the sender at a certain time. By corollary, the ability to show that the sender had the dataset before the receiver was issued with a copy.
b) restriction of audience
In the existing CloudStor system it is true that the recipient is issued with a personal download URL; but if that link accidentally gets forwarded to someone else (who hasn’t found themselves inadvertently forwarded nuggets in a long email thread?) , or if that link is perhaps phished from the recipient’s email system, then the linked data is exposed.
Particularly in the case of privacy-sensitive or commercial-in-confidence data, tighter controls are needed to ensure the recipient is truly the only one able to view data (malicious behaviour on the part of the recipient excluded).
These requests seemed fair and warranted to us, and in fact we’d been toying with some ideas that we thought might provide solutions to these requests.Second, what are we doing?
Second, what are we doing?
We set out to answer these requests; and the plan we had was to make use of cryptography.
a) assertion of origin
So how are we attempting to answer a)? We’re using a trick called cryptographic timestamping. This allows one to add a digital “seal” to a file.
This seal consists of an essentially-unique fingerprint (technically, a “hash”) of the file, linked mathematically to a record of accurate time-of-day at the moment of transaction.
You, the sender, are issued with this seal; it proves that at a certain point in time, the file as you own it existed. If at some later time someone tries to refute your claim that you possessed the file at that time, you can again generate that fingerprint; if it “checks out” (through cryptographic/mathematical methods) with the seal, that proves that the file you possess existed at the time embedded within the stamp.
Some procedural hygiene is needed for this scheme to work:
First of all, you need to ensure you retain the file, exactly as it was when you stamped it, for eternity if necessary. If you change a single bit, the fingerprint changes too, and won’t check out against the stamp.
Second, if you want to take it further and prove not just existence of the file, but also that it was yours, the file itself needs to say so. This can be as simple as a declaration at the beginning of the file (“this is the lab book of John Doe”); the “author” field in many office documents would do a fine job, too. In any case, something needs to be asserted.
Third, it’s important to realise this proves the existence of the file at the time of sending only; it does not prove that your addressee ever was in receipt of the file, or indeed that someone else possessed it before you did.
The federal government and the National Measurement Institute play a role
Thanks are due, by the way – you may have wondered where we get this accurate, trusted time from. Isn’t that the job of the National Measurement Institute? Quite right it is!
When we started sourcing for our project, it turned out that a cryptographic time-stamping service based on the NMI’s clocks already existed; this is provided by VANguard, the Australian Government’s program to deliver secure digital transaction services to the government sector. The people at VANguard were gracious enough to allow AARNet access to their timestamping platform, and this is how we deliver timestamping in CloudStor; by linking to the VANguard timestamping service.
Think this timestamping addition to CloudStor makes sense in your science workflow?
This is a more commonplace use of cryptography, and one you’re probably already familiar with: we’re encrypting the file. Only those people who possess the password will be able to decrypt it; thus, much tighter release control is handed to the sender
Now, given that CloudStor is a store-and-forward system, there’s essentially two ways that we could have dealt with encryption. By far the easiest solution technically would be to let you upload the file in plaintext, then encrypt it while it’s on the server, and then let the recipient download the encrypted file.
While technically much, much easier to implement, this has one serious drawback: the server would know the password, i.e., not just you would know the password and be able to decrypt the file (not to mention anyone listening in on your network connection).
Recent international developments (Snowden et al.) have taught us that this is a decidedly suboptimal way of going about things. The alternative is to encrypt the file while it’s on your machine (“clientside”), and only upload cyphertext blocks.
This way, the server would only ever see encrypted versions of your files; it wouldn’t have the password.
Long story short, it’s been nontrivial to get it all to work, but we’re very, very close
The code we have now works on the latest releases of Opera, Firefox, Chrome, and even InternetExplorer10 and up, including the IE Metro version, and we believe it’s in good enough shape that we can harden and productionise it. Expect a public beta by Q3 2014.
If you’re absolutely dead keen and availability of an alpha-quality service would be of enough use to you that you’re willing to live with the attendant contingencies, let us know and we might be able to give you early access on a case-by-case basis.
The scientific community is a global community with collaboration at its heart. A great example of this is the Science Mesh, a European Union-funded joint effort to build a rich ecosystem that enables frictionless data collaboration for research. The...