Sharding PPNet

Jun 9, 2014 at 4:11 PM
Edited Jun 9, 2014 at 8:41 PM
I put up a sketch here: http://thali.cloudapp.net/mediawiki/images/9/99/PPNet_sharding_proposal.pptx

Some of the questions raised as a result of this exercise:

For a shared space ("room") X, is Membership-in-X a separately sync-able db?

Can a room's gatekeeper control membership unilaterally, or is there an invite/accept protocol (or are both ways possible)?

How do members leave rooms?

Are joining and leaving protocols codified as platform behaviors that apps can leverage?
Jun 10, 2014 at 3:50 PM
So issue 60 should be fixed so hopefully you can upload whatever you want.

But I'm a bit confused by the slide deck. I thought we are just focusing on PPNet? As I understand it PPNet does a few things. Primarily it lets people post updates about themselves that can include text, pictures and location.

We can model these updates as separate databases, one per user. The database (or group of database, we can talk to PPNet about their preference depending on things like index and query patterns) is essentially stand alone. Each user is separate from every other user. A user can only write to their own database and nobody elses.

However there are two other features that make things just a tiny bit more complicated. These features are the ability to 'like' another person's status update and the ability to hold conversations about other people's status updates.

Ideally I'd like to keep the rule that users can only update their own databases. So this argues that if user A wants to comment on a post from user B then the comment would go into user A's database, referring to user B.

There are problems we have to solve for this to work:

How to index to quickly find all comments/likes for a post

Imagine that the user is looking at the post from user B and imagine they have a synch'd database from user A. How do they see user A's comment on user B's post? I suppose we could do something wacky like do a query across all the user databases for each and every post but man that sounds kinda expensive.

A simpler solution is probably to create a database that contains an index of all comments on a particular post (see below for how we can get the pointers right). The database can actually just contain pointers to posts or the posts themselves. We need to talk to PPNet about this.

How to see comments/likes from people you don't know

In this model user C can only see user A's comment on user B's post if user C has rights to synch user A's database. Now, maybe that's a feature. But it should be possible to do 'globally viewable' comments (think of blog posts). I think we can handle this feature later. But when we do it probably happens by having user B put user A's comment directly into their feed and then pointing back to user A's database entry for validation by those who actually know User A.

How to point at stuff

How do we point at user A? This is actually a hard problem because user A has many keys. Let's say they have a phone and a tablet. Each device will have a user key and a device key. So user A is actually at least 4 keys. We can't have a single global key because this would require either moving private keys around (no) or having a single point of failure (another no no). The general plan for this in Thali is that user A gets introduced to B and will get the identity key on the device user B used in the introduction. This will then lead to a well known 'identity database' on user B's device that will include a record specifying the other keys that are to be treated as user B.

To keep things simple for now I suggest we just put in httpkey URLs and that local users will then have to do a lookup on the httpkey URL to see what user it maps to. This is actually really important because it's how pet names work in practice. We need to discuss pet names with the PPNET folks because it's absolutely central to any truly decentralized system. I know there are people who claim to have solved this problem but for various reasons I need to write an article about I'm not convinced that is true.
Jun 10, 2014 at 4:06 PM
"A user can only write to their own database and nobody elses."

Yes. In (what is currently) slide 5, for example, A and B are cross-syncing their posts databases. A speaks in A's, B speaks in B's. In the next slide, sync has occurred, so B get's A's post and A gets B's. Not shown but perhaps necessary for PPNet is the aggregation of A's and B's posts databases into a (purely local, i.e. PouchDB-only) master used for display.

Do we agree that the number of Thali (not PouchDB) dbs involved, per node, is:

1 for the address book

1 per room for membership

1 per room per member for members
Jun 10, 2014 at 4:10 PM
I'm confused where the concept of a room comes in.

It seems to me that each user has:

1 DB for their address book

1 DB for their own posts

N DBs replicating the DBs of other users they know

Perhaps I'm simply arguing that there is just one global room?
Jun 10, 2014 at 4:18 PM
Edited Jun 10, 2014 at 4:19 PM
"the ability to 'like' another person's status update and the ability to hold conversations about other people's status updates."

PPNet is already nicely organized for that. A like is a separate doc bearing the ID of the user issuing the like, and the ID of the referenced post. Ditto for comments.
Jun 10, 2014 at 4:27 PM
And that all works primarily because they assume a single database instance. But in a Thali scenario there are many databases and there is no way to guarantee unique IDs. So that is why we end up with the 'two' hop to translate the URL (probably a httpkey) into the address book. It also brings up really fun security issues. I actually started to spend several hours yesterday trying to write this all out when I realized I was jumping into a very deep pit and couldn't do that right now.
Jun 10, 2014 at 4:35 PM
"just one global room?"

That's true now, with everybody talking to a public CouchDB instance. (But not necessarily, since there can be multiple of those.) Let's find out what the PPNet folks think.)

From a Thali POV, though, I'm trying to figure out if the address book is a people/permissions db with references to separate apps db, or if it combines people/apps/permissions. Are apps principals?
Jun 10, 2014 at 4:40 PM
We need to separate out Thali and PPNet running on Thali. They aren't the same thing.

Thali has an address book and yes, we want PPNet to leverage that address book. But let's face facts, our address book hasn't been written quite yet. So we may need to use PPNet's until ours is ready.

From PPNET's perspective (running on Thali) there is an app called PPNet. It provides a way to post your own status updates and see the status updates of people you are interested in (and who let you). Therefore each user inside of PPNet has a single context - the aggregated status updates/comments/likes of all the people they are tracking. The only little 'gotcha' is that it's possible for there to be likes or comments by people you don't know and don't follow. That will require some additional magic to let you figure out who they are. But I suspect we should add that in V2. :)
Jun 11, 2014 at 9:16 PM
Edited Jun 11, 2014 at 9:23 PM
I've refreshed the scenario with these changes:
  • Include PPNet membership within the directory.
  • Distinguish between general Thali issues and specific PPNet issues.
  • Propose that for cross-provisioning in PPNet, we Do The Simplest Thing Possible at first. So for example, pretend that Jon Udell and Yaron Goland are petnames for Thali identities. Analogous to PPNet's Simple Login, we add one another by those names. And forget multiple rooms and invitation/acceptance for now. If we add one another in a PPNet context, we see and can participate in all rooms that we've mutually created. There would likely be just one room to start, and we'd agree by out-of-band consensus on its name.
Jun 11, 2014 at 11:53 PM
What is a room?
Jun 12, 2014 at 1:57 PM
Room: A named shared space. In PPNet now, the name is the URL of a CouchDB server, hard-coded into a distribution of the app.
Jun 12, 2014 at 4:51 PM
I think the idea of a room introduces a concept we don't need. There is conceptually one 'social space' for everyone, everywhere. That's it. No rooms. No permissions based on rooms. There are your friends, you can see their feeds, they can see yours, that's it.
Jun 12, 2014 at 4:58 PM
Yep, that's my conclusion above as well: Do The Simplest Thing Possible. Since the PPNet folk won't necessarily want to commit to one global space, though, we might consider Room X to be a singleton for now but potentially one of many.
Jun 12, 2014 at 5:43 PM
At this point let's just talk to them.
Jun 13, 2014 at 3:38 PM
We thought about several scenarios and it boils down to this: if we want to keep the offline functionality of ppnet, we need to synchronize data that is not the users own data to the local filesystem. So there is no way to keep only your own data locally in this scenario. If this is for some reason desirable, we need to switch to online only, which boils down to making regular queries to the CouchDB (be it over pouch or not).
Jun 13, 2014 at 3:53 PM
Dirkk0, I think there are a couple of assumptions buried in your comments that don't quite fit in Thali. I suggest it would be easiest since we are so early in the process to jump on a call and discuss this high bandwidth. Email isn't great for early parts of discussions.

But in case you aren't available - in the Thali model you absolutely synch other user's data (that you have permission to) to your local file store. This is just how PPNet works now with PouchDB. When I point PPNet at the CouchDB server I'm using I download to PouchDB everything about everyone I'm allowed to see.

So in Thali we just move the CouchDB server physically on the device instead of running it on the cloud (and then sync everyone's CouchDB servers with each other). But in terms of the data on the user's machine, the results are the same in both the centralized and distributed models.

Make sense?
Jun 13, 2014 at 5:11 PM
This absolutely makes sense.

I also agree that a call makes sense, and I'd like to introduce my colleagues to the discussion, too, so that we have all stakeholders on the desk. As you stated, we want to have ppnet as simple as possible, but also as useful as possible. Additionally, I have the impression that Thali could be the perfect counterpart for the issue we lack most: security.

Let's take the appointment discussion to email.