@madpilot makes

Proposal: An open inter-conversation microblogging protocol

Spurred on by Gary’s discussion on the number of micro-blogging sites around, the “Is it Distributed?” question made we wonder if we are going about this wrong. Cameron Adams was right when he said there is only one social network, so why are we flicking between a large number of them? Why aren’t we running out own?

Beyond a number of small superficial differences they all do the same thing – you add friends, post what your doing (usually in an arbitrary 140 characters or less) and read what others are doing. There really is no reason why this can’t be truly distributed, i.e. I can run my own micro-blogging site, and all my friends can run their own micro-blogging sites – all that is needed is some glue (a communication protocol) to bring it all together. The great thing about this, is we already have systems to make this happen – get your buzz-word bingo cards out people…

RESTful XML

The first part of this system is a RESTful API that allows friends to post information in your timeline and you to post to others. Everytime you post to your microblog, it will iterate through your list of friends and forward the message on to them. The same thing happens if you delete a post – if notifies all your friends to remove the post from their local database. To ensure that random people can’t spam our feeds, we can use OAuth to give “friends” permission to send us information.

Your own timeline

The reason that your microblog would need to be notified of other peoples posts would be so you can cache these posts on your own microblog, which gives you a twitter style public timeline. The advantage of this is that there is basically no database load to display YOUR feed – the only information in your database are the posts that you want to read!

Adding friends

So how can you add friends and allow others to follow you? This is actually pretty easy using OAuth – by adding your microblog to your friends microblog authorised list, they know that you need to be notified on an add or delete command. This gives us the side effect that we can manage not only who we follow but who follows us – if you want to stop someone from following you, you just de-authorise them. So what happens if a new friend adds your microblog to their timeline? A simple GET command could be made to receive all of the posts by the new friend, effectively syncing up the two databases – all future posts will obviously push to the new friend (and vice-versa) so there is no expensive polling.

Other peoples timelines

If someone has a public timeline, this is a no-brainer. Each persons microblog would just be available and others can just read it. But what about private timelines? Enter OpenID. If each of your friends provide an OpenID URL, they would be able to login to your microblog to read your private feed – no password required, but is still totally private.

Discovery services

Many twitter users scour the public timeline waiting for people to post things that they are interesting in. This is actually quite easy to implement on a distributed system – have a read only super node that everyone posts to. Voila, instant public timeline. This also means that you can easily create “channels”. Instead of only having one public timeline, you can have many based on different topics.

Unlimited extensions

One of the value-adds for Pownce is the ability to share attachments and events. In reality, all it does is provide a link to a file on a remote service. If you wanted to add this function to your microblogging site, you can quite easily – as long as you post the link to others. This means you have complete control over what your microblog does, as long as it still talks the protocol.

Advantages

  1. The obvious one is you aren’t at the mercy of servers doing a twitter (ie. being up and down like a yo-yo). If your friends server goes down you miss out on their posts, but no one elses.
  2. You have control over your data – you don’t have to worry about a service disappearing overnight and you not being get at it. It all on your server
  3. Distributed data – your server dies and your harddrive explodes, your data can be rebuilt from the data that is stored on one of your friends databases

Disadvantages

  1. If someones site is down they may miss some updates, so you would need a method for re-syncing all friends posts from a certain date – no biggie.
  2. It does make completely removing your account difficult as you can’t really ensure your friends are going to respond to delete commands correctly

So what do people how don’t have their own server to run this on? This is the kicker – you can still have hosted versions of the system. This works for blogs (I host my own, but some of my friends use systems like Blogger.com) and OpenID which makes it much more accessible.

If there is some interest in this, I’m sure we can start drafting some specifications. I’d be interested in your thought.s

8 comments

  1. [copy of email from JoeC to Myles]

    Thanks for the comment. I just read your blog post. Hope you don't mind an email response. I feel very constrained by the comment UI of most blog systems, a point which actually bears on this discussion.

    I was in the middle of writing a new post on a similar subject to yours when identi.ca came out the other day and distracted me. :) You've actually given me the push to go back and finish it. Its main point is that instead of joining a multitude of different communities that all have their own messaging, why can't we each speak from our own Personal Publishing Site? Why can't we decouple the grouping of people in communities from the message transport that they talk to each other with?



    I think we are on the same page with that notion. I continue to have concerns about scaling up an approach that depends on polling and http to distribute microblog posts. Just as an example, I have over 900 followers on Twitter, and I'm very typical. There are more popular users with tens of thousands of followers, and more to the point who follow tens of thousands. It is simply impractical to poll tens of thousands of sites to get short messages on a short time basis, say once a minute, which is the expected time. Polling just doesn't scale in this application. There has to be a notification mechanism for message-type traffic.

    I hear you when you talk about the difficulties of setting up Jabber servers, but the problem is not insoluble, and actually could lead to a whole sub-business in notification services. Your personal microblog could use multiple such services, including your own, and this is probably needed to solve the scale-up problem.



    One interesting question to ponder, if you have some cycles, is to imagine we have solved the personal publishing distribution problem. Now, how can we build thriving communities that let us individuals form cohesive groups of different kinds when we aren't forced anymore to communicate "on" that community's site, ala Facebook and Twitter?



    Let's keep up the discussion! I'll also copy-post this as a comment on your blog.



    JoeC

    Stonington, Connecticut, USA
  2. Hi Joe,



    Thanks for you comment.



    I think Jabber could be part of the solution - mainly for distributing posts to a large number of users (think @scobilizer), but I would like to think it wasn't required. Mind you, you could achieve the same thing with a simple queue system.



    With you question about building communities, I don't think this is actually a technical problem! Communities will naturally build when the tools are there. Of course the "super nodes" as I called them would encourage these communities to grow.



    I wonder if this might be worth a look: http://smob.sioc-project.org/
  3. Hi Myles,



    You might be interested to check out the Noserub protocol, which is based on OpenID and XFN. Their aim is to build a fully decentralised social network, but some of the ideas would also work for microblogging too.



    I like the idea of RESTful messaging.

    Twitter and friends tend to be used more for instant messaging, but you really need a push-based protocol to make this work efficiently. I wonder how many of Twitter's woes are due to their site being constantly hammered for status updates.



    You could easily generalise the idea so that each person has a number of "output" and "input" channels. Basically you want a distributed system for discovery and authentication that decides who can connect to which channel, and that handles the distribution of messages. It might be necessary to use a mixture of push- and pull-based protocols depending on the relationship between subscriber and producer. For example, I usually want timely updates from my friends but I have relatively few of these. For others I'm prepared to accept some latency, which could be used to batch notifications together, reducing the number of requests that need to be made.



    Its probably also useful to move away from a "publication-time" based view in the client, since this makes it hard to deal with latency and failure (eg. you miss a delayed message because it appears in the past). I think the Inbox-type "reception-time" view is more useful. In any case, its important that the transport is reliable and not lossy. There's nothing worse than a system you can't trust.
  4. I don't think REST is quite enough, and I think we need more that just OpenID.



    (This is almost a follow up to my comment on Tuna's blog).

    I am thinking (mostly just out loud) of a solution, and it is probably wise to not worry too much about technical technical considerations until we understand the what and why better.



    Some thoughts about what this spec would need to be:



    1) Some thing really redundant, so what I will call "Individual User Comment (IUC) (get it?) doesn't get to be stored just in one place. If a site or a blog disappears that (IUC) can still be retrieved. Perhaps all over the place. Or perhaps on a few really big player, back haul operators who what to offset the content against download interconnection. Perhaps marked in some way that it might be cached in a proxy is read wildly enough. Perhaps IUC might be stored and gathered in an analogous way to Google. Perhaps even in XORed fragments like a RAID5 array. Perhaps even as a business like Amazon S3.



    2) Encryption or at least digital signing of each message. The message hash might be used as an identifier key. The author would sign it with a private key, and perhaps also quote the context, so the message will remain stable (meaning not subject to edits). Once published it can't be changed. Or perhaps there can be a hot period of 48 hours where it can be deleted (an unflame method). The digital signing doesn't need to be something that ties the message to a person, just to a handle. So there are issues about privacy, about spam, and for some nations personal security that would need to be considered here. I am not sure OpenID is the answer for that reason.



    3) A message protocol. A contributer writes a comment on their "comment editior" and submits this to their favorite IUC engine. The IUC engine generates the message hash, and perhaps automatically submits it to the blogger. The blogger reads the message and if suitable publishes the full unencrypted contents, but leaves the message hash in place so it can be verified, or aggregated or whatever as needed. Alternatively, if the 'cloud' is fast enough publishes the hash and the user agent picks up the contents. The second idea solves the bigger problem of who own and controls the content, while risking problems related to increased network traffic and of course depends on external data sources being up to speed and alway fully available. If this worked well imagine how much better would say twitter would be.



    4) A content protocol. Perhaps even using HTML block text with an ID that is the hash or some short hand for it. Something with very little overhead, so that just the hash will retrieve the message contents( where contents might be anything you could MIME.), but further requests could also retrieve the original URL and perhaps other poster details if available.
  5. For those playing at home, it looks like http://openmicroblogging.org/ specifies most of what I have said already, and the http://laconi.ca (Which drives http://identi.ca) implements it. I've installed it, but I'm having some problems with remote subscriptions - I can subscribe and receive posts from identi.ca, but not the other way around. I suspect it's an issue with my server (Updating to the latest version now). I think it will still be a fun exercise trying to implement this myself - especially since I'm giving a talk on this stuff for WDS and sample code is always a boon.



    Stewart: Thanks for that link. Regarding REST, there isn't any reason (with a bit of help from a queue) that REST can't be a successful push provider - it already is from the POV that if the sender has and endpoint to the receiver, it can send a message easily. Add a message queue for those times that the receiver is unavailable and you have a simple mechanism that uses proven tech. I still like the idea of publish timestamps, as they keep the conversion congruent.



    David: Intersting thoughts - maybe I'm looking at this as a specific problem, but I think you might be describing a general blog/comment system rather than a microblog (twitter style) system, but I take your point as I guess it still applies.



    The beauty of such a distributed system is you don't need to rely on a big third party for data-backups - everyone of your "followers" would have a copy, giving many points of recovery.



    With your security concerns (i.e signing of messages etc) you have this baked in (to degree) with OAuth, which is based on signatures and tokens. Of course these aren't signed, but I wonder how important this is for day-to-day microblogging? This does bring up the interesting question of "comment fraud" or people impersonating others when commenting/posting - I haven't heard anything about it myself. Have you, David?
  6. Hey Miles. Re: comment fraud -- I haven't really heard of any except on the Newsgroups, where all sorts of shenanigans go on, as I am sure you would know. The benefit of digital is that it would signing would reduce most of those sorts of issues, including flaming and trolling as reputation would be a more significant factor.



    And you are right, I was taking the problem and trying to look for a revolutionary rather than evolutionary solution, and not just dealing with Mico blogs. Something that addressed the issue of ownership of the IUC and allowed it to live on unconcerned with the lifetime of the blog or messaging tool. Of course most IUCs probably deserve to die even when comments approach essays or when tweets become where wonderful aphorisms. But I think as a tool for communications it would be nice to be able to write those is once place(and it doesn't have to be text - and content would do), and send them off to other places, knowing that others can follow up and see the bigger picture of what you are saying, and more importantly so you can see what others are saying. Face book, Flickr, twitter, blog posts, YouTube, whatever; the lot all readable and sendable and transferable from one enabling technology. Most people don't care probably. An those with blogs have some control over the process. The question is, are those in the middle (like me) going to become a larger group with something worthwhile to contribute?



    My concern with OpenID in this context is the ID part. To get the ID you must be known to a certain number of trusted people. Lets say you were a blogger/commenter in China or Iran or somewhere where it doesn't pay to stray from the state approved line. It would be nice to know that as a reader these were really the comments of the person you had been following and whom you could assume felt free enough to have a real opinion as they couldn't be identified (at least directly) by their states secret police. Though I have heard that one form of Chinese censorship is to protect the west from rabid anti western sentiments!



    I have to confess that I wasn't aware of http://openmicroblogging.org/ http://identi.ca/. I had heard of REST as a general term, but didn't know much about it. Having now read the wikipedia.org article on it I am not sure I do now either. On first parse it looks like the W3C version of Microsoft DNA or the OSI model for networking, as it seems to be trying to generalise what is already there in very specific ways!



    I think there is room for something very new here.
  7. David: With out getting OT, I'm using REST here to denote the HTTP GET, POST, PUT and DELETE commands, which are the basis of ad-hoc webservices. It is useful in this instance as we would be using the language of the web for the communication. I don't think it is trying to generalise what is there already - it's just the formal specification of what is there aleady!



    I agree that there is something bigger in all of this - what has been described is but one small part of the puzzle. There is basically three basic problems with pushing Web 2.0 to the next stage that we can identify here - Notification, Authentication and Authorisation. Whilst perhaps not perfect yet, we have OpenID and OAuth (I'm more than happy with both, although I'm not particualry paranoid about such things) to cover the last two - it's the notification bit that we need to crack, which is where the POST, PUT and DELETE part of REST comes in.
  8. To work with HTTP it proably has to be REST in the general sense. But that really means just HTTP GET. But I think that is just the read only end point. Getting the content into the cloud doesn't have to be as limited and I am not sure you could do Authentication with just PUT (since it has to be stateless).



    I think that if something evolves it will need to cover authenitcaion, even if just to say what it wouldn't cover.



    And, I also think something less than a revolutinary solution will just get lost in the noise.

Leave a comment