Should we change our strategy for making PouchDB talk to the Thali Device Hub?

Coordinator
May 2, 2014 at 4:15 PM

Where are we?

So right now our strategy for having PouchDB talk to the Thali Device Hub (TDH) is very generic. We write an adapter for PouchDB that bridges to Java code that then speaks mutual SSL auth and in all ways walks and talks like a normal Thali client.

This is nice because it makes PouchDB look just like a Java Thali client or a .net Thali client.

And better yet, it really does work.

The problem(s)

But there is a price and I'm deeply concerned this price is quite likely to go off the deep end.

The price is that every request and every response has to be turned into a string in order to be dragged over the Java bridge.

But there is an even bigger price that has me really worried. Eventually we will want to handle attachments. More to the point we'll want attachments that are pictures and movies that we will want to display in the browser.

The idea of dragging a movie over a string interface isn't my idea of a good time. And don't even ask me how we get it to the movie handler. We might be able to do something with the binary data type on modern enough browsers but now we are making things really painful.

I strongly suspect it's a requirement that someone can dump in, say, an img tag and have that properly work.

With our current setup the img tag wouldn't work because the webview doesn't know what a httpkey url is.

Yuck.

So then we start walking down the weird path. How do we make img tags work? Or video tags?

Stick with the strings

I think for the PPNet demo we will need image support. So we have this problem now. But images aren't videos and in the short term we could potentially just live with taking the attachments, turning them into base 64 strings, moving them over the bridge and then making them appear as blobs from the PouchDB API. Note, btw, that our xmlhttprequest adapter doesn't currently support blob types. But that shouldn't be brain surgery to fix.

But we have to decide how hard we want to make this work. For example, if there are big pictures then their full value will have to be serialized into memory before being moved over. In theory we could fix this by introducing a streaming interface but man what a complete waste of time!

So if we are going to stick with strings for now we better stick to small pictures!

Setting up a second listener for getting attachments

Right now the TDH listens for mutual SSL auth connections typically on port 9898.

Imagine we set up a second listener on localhost on say port 8989 (or whatever). Because it's on localhost it can't accept requests from the Internet, only from local apps with access to the localhost handler.

But not all local apps are authorized to get everything in the TDH and the second localhost listener can't tell who sent an incoming request.

But there is a pretty well known pattern for solving this, a bearer token. We make the client get a bearer token and then use that bearer token in its requests. We need the token to be in the URL because the img and video tags aren't going to support authorization arguments. Normally putting bearer tokens in a URL is a really bad idea. But in our case I believe it's o.k. because we are talking about stand alone applications that don't share caches.

So the idea is that the app would get a list of documents and see it has an attachment. It would take the attachment ID and create a URL of the form:
http://127.0.0.1:8989/[bearer token string]/[database name]/[document id]/[attachment name]
This would go to our localhost Couchbase lite listener who would validate the bearer token and then let the request through.

So the workflow would be:
  1. App gets its bearer token via some java bridge voodoo
  2. App talks via PouchDB using our adapter and string bridge nonsense
  3. App gets URLs for attachments with images and video it wants to display
  4. App mangles attachment URLs into form described above
  5. App uses DOM to submit mangled URL
  6. WebView (in Java or Android) does a HTTP GET to the mangled URL
  7. The localhost listener receives the URL, validates the bearer token and completes the request
To make things even simpler we can almost certainly implement this proxy server by taking a Couchbase Listener and screwing around a little bit with its URL processing logic (basically enough to validate the bearer token and then pull it out). So this shouldn't be too hard to implement. The bearer token itself can be something trivial like an expiration date plus a cryptographically secure random number. Not brain surgery really.

Double down on the localhost listener

But if we are going down this path then it begs the question - why not go further? Why not just get rid of the PouchDB adapter all together? What if we just use the mangled URL syntax for PouchDB? In that case PouchDB just sees and plays with a perfectly normal http URL (the database name is moved one path segment farther to the right but I believe PouchDB can handle that) and now we can use off the shelf PouchDB.

I think this works. I'm not 100% sure. I don't think CouchDB's API ever puts fully qualified URIs in the request or response. Instead URIs have to be constructed. So in theory as long as PouchDB knows the 'base' (e.g. including the bearer token) then things should 'just work' without the Couchbase core have any clue what we've done. I think.

But wait, why didn't we do #2 from the start?

The reason was that I was afraid of having to write and maintain two different listeners with two different security architectures.

Honestly, I'm still afraid of that. As the only dev every feature I add takes us farther away from ever finishing this thing!

Intercepting URLs

TL;DR - This section describes really fragile tricks for intercepting new URL types in webviews that aren't going to work in the real world, so feel free to skip this section.

Another trick wouldn't require the proxy server at all. That trick is to have httpkey URLs stuck into the img and video tags. In theory this might, possibly, potentially, theoretically, work.

In the case of android there is an interface on the WebViewClient (which handles callbacks from the webview) called shouldInterceptRequest that in theory should get called and give the app the chance to either let the request go or to put in its own response. In theory this should work for random URL types. So in theory we should be able to use it, detect httpkey, handle the request using a nice streaming library (built into Java) and make things work sensibly. Unless img or video or embed or whatever doesn't properly use souldInterceptRequest. :(

In the case of Java things get uglier. The only mechanism I can see that might, theoretically, work here is the URL factory handler. You can register a URL factory and the system will check with it when resolving URLs. Now, this assumes that the WebEngine in JavaFX actually pays attention to this handler. Which it might not. But another fun fact is that per JVM instance only a single call to register a factory handler can be made and Couchbase is calling it!!!! Now in theory this might not matter. The reason is that the TDH shouldn't be running in the same process (and hence on the same JVM) as the client application. But that means (wait for it....) that we can't bundle our TDH management UX in HTML with the TDH itself because we would run into this issue.

There are two possible workarounds.
  1. We could wrap Couchbase's factory with our own. Heck, we have the code, we can do what we want.
  2. There is another approach that seems exactly like the kind of fragile scary thing one debugs forever.
So with one of the two above choices we could get using HTML as our UX for the TDH's own UX.

But to complicate things further we want to eventually use Chromium and I'm not even sure the intercept APIs there work in the same way and if they do we would have to use JNI to expose them in Java since I don't think they are in the existing CEF. But I couldn't be completely wrong there since I haven't used the Java CEFs yet.

So now what?

So I think we can write off intercepting URLs. It behaves differently in each environment (e.g. Android, JavaFX and Chromium) and it's not even clear how robustly it would work.

So this leaves us with either trying to move large binary objects over strings or setting up the localhost listener.

Now to complicate things further there might be real value in keeping around the PouchDB adapter. It means that a HTML client can talk to ANY TDH, not just its local one. That's kinda cool. So it might be worth at least doing the work to support the blob type for getAttachment in pouchDB, even if we do it in a horribly inefficient way.

Alternatively we can stop putting good money after bad and just switch to the second listener approach whole hog.

My guess is that we'll do the blob work because I'm too scared to mess with our demos for HTML 5 and London. But if we do get another dev resource then we probably should prioritize the second listener so we have a solid story for video and images.
Coordinator
May 13, 2014 at 5:28 PM
Since I wrote this things got more complex, something I touched on at length here but the short version is - JavaFX 8.0's WebView looks like a mess so we abandoned it and I'm not super comfortable with the Java Chromium Extension Framework. So this leaves us without a story on the PC! In that previous link I talked about a bunch of possibilities.

I'm going to add another one and I'm going to add it here because I'd like to keep the other thread for discussing milestone issues.

What I'm thinking is what if we leave the Listener alone? It only accepts mutual SSL auth connections and that's it. No localhost (again, see the link).

But what if we set up a stand alone Java HTTP Proxy that each Thali browser based application would run?

Scenario Set Up

A Thali developer writes a HTML app that they want to run in Android and desktop that can talk to Thali. Thali provides a Cordova like wrapper for it (please Cordova, can you please support the desktop already!??!?!) that they can use to generate a desktop app. The app will have an installer (some day =), a Java program that actually launches the app and a folder with the HTML content that includes an index.html file.

Scenario Steps

  1. User clicks on application icon in their PC environment
  2. This triggers a Java program which then:
    2.1. starts up a HTTP Proxy on localhost at any open port
    2.2. generates a .js file (the proxy will always do this on start) with a well known name in the directory with the application's content that sets a variable to the port of the proxy and both the client and proxy authentication secrets (I'll explain that below but the secrets themselves are just cryptographically secure random numbers)
    2.3. opens a file URL to the index.html file for the app
  3. The last action triggers the locally installed browser (whatever that is) which loads the index.html. The index.html will then:
    3.1. load the .js file with the well known name and thus get access to the port the proxy is running on and both the client and proxy secrets
    3.2. configure all of its PouchDB calls to include a www-authorization header containing the client secret
    3.3. configure all of its PouchDB calls to validate that the response contains a server-authorization header containing the proxy secret
    3.4. use a magical method to talk to the proxy in order to get configuration information like the local TDH's full httpkey
    3.5 do it's thing

Is that secure?

Sorta. Let's look at the attacks.

File attacks

If anyone can get to the js file with the secrets then they are in like Flynn. They can send requests to the proxy with the right client secret and so do anything the client can do.

But for this attack to work the attacker must have at least read access to the .js file. The permissions on that file will be set to the user's permissions. So this means that the attacker can run code as the user (or has hacked the file system). In either case, this is bad(TM). But I would argue that either attack probably means all security on the machine has been compromised.

Remember, to use the secrets one has to get a message to the proxy over localhost. So not only does one need read access to the file, one also needs the ability to execute code locally. Now we could imagine a situation where a PC has multiple users and the attacker gets code onto user B's account, somehow gets an escalation to read user A's .js file and then launches the rest of its attack from user B's account. But if it could hack user B why not user A too?

We could perhaps make ourselves feel better by having the Java program delete the .js file once it receives a request from the client. This is probably worth doing on a security hygiene basis but the actual security it contributes seems small since it depends on the attacker not being able to launch their attack within some time window and that usually doesn't work out in real life.

For Android we will almost certainly transfer the data not via the .js file but via the WebView Javascript bridge. Performance isn't an issue there.

But the bottom line is this - the PC security model assumes that any program running as the user can do anything the user can do. There is no real app sandbox model. So using the file for security is as secure as any other data (including the user's keys) on the machine.

Man in the middle (MITM) attacks

Let's state up front that the approach used here can't completely get rid of MITM. There are some successful attacks that are possible, at least in theory. Let's walk through it.

First, the proxy will only listen on localhost. This means that the attacker has to be on the machine. But depending on the attack that isn't such a big deal. For example, a web page in a local browser can open up a page to a localhost address (although it can't see the results).

This is why we have the client authentication secret. The request sent from the web page won't contain the secret and so will be rejected by the proxy.

So to really launch a MITM attack the attacker has to get executable code on the client that can do the following:
  1. Launch a port scan on localhost to find which port the proxy is running on
  2. Have an attack where a carefully crafted packet or packets will cause the proxy to fail in such a way that it releases its hold on the port
  3. Immediately take over the proxy's port
At that point the MITM attacker would be able to see a certain number of client requests but couldn't respond to them since the attacker doesn't know the proxy's secret and so it's responses will be rejected by the client.

So a MITM attack could leak data but that's about it.

There are a number of assumptions in this analysis which need to be made explicit:
  1. That nobody on the machine who doesn't have kernel access can intercept in any way any content sent over localhost
  2. That there is no way to steal a port once it has been claimed by a running program
If either of these assumptions is false on any platform we want to support (e.g. Windows, Linux and OS/X) then the whole model falls apart.

So yes, in theory, if someone can get executable code on the machine and can crash the proxy then data can leak.

Are we stuck with this mess forever?

I honestly don't know. Clearly the best thing would be to use mutual SSL auth and call it a day. Someday I hope that the Web Crypto API will get mature enough that it will support a way purely via Javascript where someone can perform mutual SSL auth with provided keys. That would completely kill all the MITM attacks. But it's just not there yet.
Developer
May 13, 2014 at 11:53 PM
"Someday I hope that the Web Crypto API will get mature enough that it will support a way purely via Javascript where someone can perform mutual SSL auth with provided keys."

This is why you and Jeremie need to talk. They didn't wait for Web Crypto, they've got stuff working, we need a demo, I'll inquire.
Coordinator
May 14, 2014 at 12:23 AM
Alas it is not that simple. What they have done, which is cool, is invented an entire new network stack. It doesn't use TCP which means it can't use SSL (but they have their own variant), it can't use TCP congestion control or packet re-assembly and it can't use TOR and it won't work with any existing server. They have a replacement for the key based naming provided by TOR hidden service but not the security nor the firewall penetrating aspects of TOR. UDP is blocked by some firewalls so you can't assume it will go through and even if it will, it can't necessarily penetrate NATs. And for synch and such we would either have to put a completely new head on Couchbase Lite or we would have to create a TCP to UDP translator (oh joy, another bridge).

None of which means what they did isn't necessarily right. Only that it requires a lot of compromises and it's not clear to me that the benefits one gets for those compromises are worth the cost. That's what we need to talk to Jeremie about.
May 14, 2014 at 2:52 AM
So yes, we're building a full network stack :) It's not just a variant of SSL, it's a complete rebuild of the core principles of packetized communication with everything encrypted/verified, including all of the aspects of TCP, and it's not that scary, congestion control, sequencing, etc are very well understood now and the benefits of building them such that all metadata is also private from all networks (not just to a gateway/bridge) is by far worth it.

Although some of our implementations are still in development, I'm confident that we've got the best possible firewall and NAT punching architecture, and the JS/C/ObjC libs are already demonstrating that. While UDP is the first network path we built out and the most common/efficient, it's an overlay network, so it'll also use HTTP (even over hostile proxies), WebRTC, Bluetooth (LE), and whatever else is available on any platform (like multipeer on ios). In fact, any/all network paths between to nodes are utilized, so that transitioning between cell and wifi is seamless to an app (and no security state between them is lost).

We're also working hard to provide the modern networking protocol primitives atop telehash like HTTP, making it easier to adopt and use existing design patterns (but the "server" can be and function anywhere equally obviously). You're right though, the most ideal way to do this is inside Lite, or with a local proxy (which can easily speak/map HTTP), but it's still a lot of work to be done and it'd be hard for even me to feel good about combining two not-battle-hardened projects (telehash and Lite) :/

Also on the privacy comparison to TOR, that's something that is high on the list to document/explain better, but the tldr is we designing to provide significantly better privacy by default on all fronts except the current public network endpoint of a peer you're connected to, and that level of privacy is will be a choice an app can make at the sacrifice of bandwidth/latency. We're trying to get the core performance solid before adding that overhead, hopefully that all makes sense!
Coordinator
May 14, 2014 at 5:39 PM
I read the core telehash spec and what's very clear is that we are all trying to get to the same place.

We all want to empower users to communicate directly without the permission or interference of 3rd parties. In many areas we have taken the same approach. We recognized that 'true' P2P messaging requires a naming system that doesn't have a central authority and so one way or another names end up being keys. We recognize that privacy means crypto and so we use those keys (directly or indirectly via cert chains) to encrypt communication.

Furthermore I can see how others have come to very similar conclusions to Telehash about how the world should work. For example, looking at Telehash's way of organizing connections looks pretty close to what HTTP 2.0 is working on. Even Telehash's ability to use UDP is reflected in the wider community through things like DTLS and QUIC (which, since HTTP 2.0 is based on SPDY, I would expect to see QUIC end up in there as well). Even the DHT based naming in Telehash ends up looking very much like Tor's hidden services.

For me the core issue then is this - we are kindred spirits trying to reach the same goal. Therefore I believe all of us have a moral obligation to figure out how to work together and leverage each others abilities.

So once we get past the keynote next week let's all get together on a call (higher bandwidth - lower latency, a better way to start a discussion) and see what we can figure out.
Coordinator
May 20, 2014 at 7:07 PM
There is actually another way to implement the proxy that wouldn't have any security issues at all. It's more complex though.

The issue, just to remind the reader, is that using the set up above it's possible for an attacker (who has to be running on the same machine but under a different user account) to knock the proxy offline and see incoming requests from the user. No responses will be accepted since the attacker doesn't have the proxy key. But this still lets the user's request be leaked.

The way to get rid of this threat would be to establish a handshake on each localhost call when establishing a connection. The design is trivial and I've actually implemented it before. It goes like:
  1. The proxy starts running and generates a single secret we will call SECRET and puts it into the .JS file for the browser code to pick up along with the proxy's port.
  2. The browser code opens a Raw Socket connection and sends down a request containing a cryptographically secure random number we will call ClientChallenge.
  3. The proxy responds with HMAC(SECRET, ClientChallenge) and sends down its own cryptographically secure random number we will call ProxyChallenge.
  4. The client then responds with HMAC(SECRET, ProxyChallenge)
We now have a 'secure' connection and future requests can be sent without any additional security. The key here is that both the client and the proxy have to make sure that they only mark connections as secure once the exchange has happened. So, for example, if a connection is lost then the handshake has to be done again on the new connection.

Implementing all of this in the browser isn't exactly brain surgery. Our main interfaces is XMLHTTPRequest (XHR) which we already have a polyfill for that will talk to whatever we want. So we could just put something like socket.io under it. This would be super easy because our existing code already serializes requests and responses as JSON.

What sucks about this approach is handling attachments which are often large binary blogs.

Another approach then would be to use something like BinaryJS.

The challenge is that both socket.io and BinaryJS have their own protocol formats and the proxy will be written in Java. I suppose we could just use Node.JS but man do I not want to introduce yet another huge dependency into Thali. So we would end up having to implement something like https://github.com/binaryjs/js-binarypack/blob/master/lib/binarypack.js (which is the code BinaryJS uses to frame its messages) into Java. Not horrendous but sigh... more work.

So my guess is that we'll probably start off with the pseudo-secure approach and if it has legs then we can switch to this.

I also wonder how we hook up media elements for photos and videos to this infrastructure. I strongly suspect we'll have to implement some kind of 'one time use' URL to let clients request content over http://localhost so they can pass that URL to media and img elements in order to support streaming. This means creating a way for clients to ask for such a URL via the proxy.

Oh joy.