Automatic Synchronization and Distribution of Biological Databases and Software over Low-Bandwidth Networks among Developing Countries

Contents

Objectives

To research, implement, and test a next generation automatic biological software, courseware, database distribution, and synchronization network based on Peer-to-Peer (P2P) technology for developing countries in the Asia-Pacific region with low-bandwidth Internet links.

Descriptions

Many countries in the Asia-Pacific region are moving into the field of bioinformatics, which involves the collection, organization and analysis of large amounts of biological data through computer networks. However for many, progress is impeded by the computational infrastructure and network bandwidths. This project addresses the problem of low bandwidth and reliability through the introduction of 3rd generation P2P protocols, which uses the computing power of the entire network and allows file transfers to continue in the case of a disconnection. P2P thus promises to help facilitate the distribution and synchronization of biological databases across the Asia-Pacific region.

Background

 

Bioinformatics and the need for network bandwidth

 

Bioinformatics involves the collection, organization and analysis of large amounts of biological data, using networks of computers and databases. Bioinformatics Centers around the world have to regularly update their database repositories with the latest releases. This is normally done by a file transfer over FTP; but the large and growing sizes of these databases means that a large network bandwidth is required to ensure the new database releases are downloaded quickly and without failure. To assist this, a network of database mirror sites was established in several countries worldwide in 1997, under the Bio-Mirror project. Developing countries in the Asia-Pacific region are just moving into this new field of bioinformatics, but the computational infrastructure and network bandwidths available in those countries are still at a primitive level compared to that in more developed countries. Network bandwidth within these countries are still very low, and the low reliability of connections means breaks / aborts in downloads are common. So, in spite of the Bio-Mirrors nodes being made available, many developing countries in the world still face a major problem in regularly updating these databases. And, with the large and growing sizes of these databases, the problem will only get worse in the next years because the growth of databases outstrips the rate of bandwidth penetration to the end user.

 

A revolution in file sharing technology

 

In the late 90’s, the Internet community witnessed the start of a major revolution in the way people share files – Peer-to-Peer (P2P) file exchange was introduced with the wildly popular Napster in 1997. Internet users used this to share mp3 music and video files throughout the world. P2P technology involves exchanging files not just between a central server and multiple clients that connect to it, but rather focus on using clients to exchange files amongst one another.


The technology continued to evolve and improve, with the second generation P2P FastTrack / Kazaa network in 2001. In 2002, the BitTorrent protocol was introduced. This third generation P2P technology was a major advance over previous P2P protocols with BitTorrent, a large file to be distributed will be broken up into smaller fragments, typically around a quarter of a megabyte each. These fragments are distributed to each peer, and amongst peers, in a random manner, and are reassembled at the requesting machine.


This difference between traditional client/server distribution of files, and 3rd generation P2P distribution, is illustrated in Figures 1 and 2 below:

Image:Figure1_2.jpg


These figures illustrate the power of the concept introduced by 3rd generation P2P technology:

As the number of downloading clients in the traditional distribution architecture increases, demands for bandwidth placed on the server will only increase and lead to a bottleneck.

However, for the case of the 3rd generation P2P architecture, the more peers there are, the more nodes are available to distribute fragments of the file. High demand will actually lead to greater throughput as more bandwidth from additional nodes becomes available to the group.


Using P2P technology in distributing biological data


From the comparison above, it can be seen that if 3rd generation P2P technology is used, it offers to simultaneously solve the two major problems plaguing the distribution of biological data to developing countries:

 • Low international bandwidth

 

 • Unreliable connections

P2P technology can be applied in three areas – the distribution of biological software, courseware, and databases.

Construction

Azureus program is selected for this project because it is a open-source, famous, and easy to use P2P program. The project construction consists of four parts as follows: 1. Azureus installing, 2. Azureus setting, 3. RSSFeed Scanner Plugin setting, and 4. Advanced Statistics Plugin setting.


1. Azureus installing

1) Install security program which are firewall, antivirus, and intrusion detection system for protect your server.

2) Install azureus program which download from http://azureus.sourceforge.net/download.php. If your operating system is Linux, you should read http://azureus.sourceforge.net/howto_linux.php . But if your operating system is Windows, you should read http://azureus.sourceforge.net/howto_win.php


2. Azureus setting

Azureus setting has 2 sections which are client and server setting.

1) Client setting for download data from the seeder

1.1) Go to the Tools menu and choose Options. In the list on the left click Connection. Pick a number between 49152 and 65534, and enter in to the incoming TCP listen port and UDP listen port boxes as shows in Figure 3. Then click Save to save this change. Moreover you should open that port in your firewall for download or upload quickly.

Image:Figure3.jpg

Figure 3. Incoming TCP listen and UDP listen ports


1.2) For test download data from a seeder, download a torrent from a tracker such as KOBIC tracker (http://ftp.kobic.re.kr:6969/) as shows in Figure 4. In the File menu click Open and choose Add file for add that torrent which you have downloaded as shows in Figure 5. Figure 6 shows PSU node is downloading go_200608-assocdb.rdf-xml.gz file from KOBIC node with go_200608-assocdb.rdf-xml.gz.torrent, and the download speed is 12.4 kB/s.

Note if your Health on the torrent is red, means that you’re not connected to any peer while downloading maybe because tracker server down or no seeder.

Image:Figure4.jpg

Figure 4. KOBIC tracker


Image:Figure5.jpg

Figure 5. Torrent opening


Image:Figure6.jpg

Figure 6. PSU node is downloading data from KOBIC node.


2) Server setting for upload data to the peer

If you want to upload or distribute your data to any peer, you must create a torrent for that data, and keep the torrent in tracker.

2.1) Go to the Tools menu and choose Options. In the list on the left click Tracker, then click Server. Enter external IP address or server name. Select HTTP port check box, and enter a port such as 6969 as shows in Figure 7., so you must open 6969 port on your firewall.

2.2) In the list on the left click Plugins, and next click Tracker Web. Then select Publish torrent, enter title of you tracker web as shows in Figure 8, and select all RSS feed options for automatic synchronization as shows in Figure 9.

Image:Figure7.jpg

Figure 7. Tracker server setting


Image:Figure8.jpg

Figure 8. Tracker web setting


Image:Figure9.jpg

Figure 9. RSS feed setting in Tracker web


2.3) Create a torrent by click New Torrent in the File menu and select Embedded Tracker as shows in Figure 10. Before finish creating the torrent, select Open the torrent and Host the torrent as shows in Figure 11.

Image:Figure10.jpg

Figure 10. Create a new torrent


Image:Figure11.jpg

Figure 11. Before finish create a new torrent


2.4) On the tracker URL which will be something like http://yourexternalIPAddress:6969/ as shows in Figure 12.

Image:Figure12.jpg

Figure 12. PSU node tracker server URL


Moreover you can read more guide from Azureus User Guide -http://azureus.sourceforge.net/doc/Azureus%20User%20Guide.htm


3. RSSFeed Scanner Plugin setting for automatic synchronization


1) Download RSSFeed Scanner plugin from http://azureus.sourceforge.net/plugin_list.php and select 1.3.1 version.

2) Install this plugin by unzip rssfeed_1.3.1.zip and place it to “plugins” folder of azureus program path. In addition if you install azureus program to C:\Azureus, then you must place “rssfeed_1.3.1” folder to C:\Azureus\plugins. After that, restart azureus program.

3) Click open RSSFeed Scanner plugin in the Plugins menu and select RSSFeed, then select Options tab.

3.1) Create RSSFeed URL for KOBIC node by click “+” label and set options as shows in Figure 13. If you enter 1800 in Delay time box, means that rss_KOBIC is refreshed or downloaded the torrent automatically every 1,800 seconds.

Image:Figure13.jpg

Figure 13. RSS Feed URL setting


3.2) Create Filters for KOBIC node by click “+” label and set options as shows in Figure 14.

Image:Figure14.jpg

Figure 14. Filters setting


3.3) When you click Status tab, you can see some torrents from KOBIC node are downloaded automatically as shows in Figure 15 and Figure 16.

Image:Figure15.jpg

Figure 15. Status of RSSFeed Scanner


Image:Figure16.jpg

Figure 16. The Torrent files in Azureus program after run RSSFeed Scanner


3.4) You can read more detail in Help tab.


4. Advanced Statistics Plugin setting for test performance.

1) Download Advanced Statistics plugin from http://azureus.sourceforge.net/plugin_list.php.

2) How to install this plugin it same to install RSSFeed Scanner plugin.

3) Go to the Plugins menu and choose Advanced Statistics. When select in the Progress tab, you can see percentage of data that you can download or upload in each time period as show in Figure 17. You can see download or upload speed in the Activity tab as shows in Figure 18.

Image:Figure17.jpg

Figure 17. Progress of each torrent


Image:Figure18.jpg

Figure 18. Activity

 

Progression

KOBIC (Republic of Korea), PSU (Thailand), and NUS (Singapore) nodes are set up. However KOBIC and PSU nodes are linked together successfully, but NUS node still aren’t linked successfully with KOBIC and PSU nodes so that more node is required.

The experimental result

Let KOBIC node is seeder and PSU node is peer. The Azureus and gFTP program are tested with 2 torrents. Here is a sample result at 4:30 PM - 9:30 AM (17 hours).

downloaded => Azureus = 4.4 GB, gFTP = 3.0 GB

the average download speed => Azureus = 82.0 kB/s, gFTP = 55.9 kB/s