Monday, February 18, 2008

Bittorrent for file storage and replication

After a lengthy hallway discussion this morning, the idea for a Bittorrent based file storage and replication to replace the need for SRB/gridftp popped into my head. Here is my current musings for such a system.

Note: i'm using similar terms to bittorrent, when i refer to the actual bittorrent concepts, i will prefix the terms with BT-.

The "tracker":

Each institution will set up a tracker. The tracker would have a list of all the other institution's trackers which it replicates with. When a user starts uploading a new file to the tracker, the tracker notifies the other trackers of the new file, and starts sending the file to them using the bittorrent protocol. The idea is that all the trackers perform the tasks of a BT-tracker as well as a BT-peer. Thus replication is an ongoing process and is achieved automatically without having to specify specific replication times.

File access for users:

Because each institution is notified about a new file, all users can see the files instantly, regardless of which institution they're at. The users would download files in the same way bittorrent does. The user's institution's tracker would provide the user with a list of peers that are available, which may contain other users at their institution, or even other institution's trackers and users. The user's client would then download the file from all available sources, providing the maximum throughput. Uploading a file may work in a similar way, the user may upload the file to all peers, which would include other users and other institution's trackers. Of course, these things would have to be explored to find an optimal process.

AAA:

certificate based authentication, same as gridftp.

File sharing:

This will be a tricky section. I'm thinking the best idea might be groups. So when a user uploads a file they specify which group they wish to make the file(s) available to specific groups. All users of that group can see all the files for any groups they belong to.

Metadata:

Another tricky section that needs to be sorted out.