Automatic Synchronization and
Distribution of Biological Databases and Software over Low-Bandwidth Networks
among Developing Countries
Contents |
To research, implement, and test a
next generation automatic biological software, courseware, database
distribution, and synchronization network based on Peer-to-Peer (P2P)
technology for developing countries in the Asia-Pacific region with
low-bandwidth Internet links.
Many countries in the Asia-Pacific
region are moving into the field of bioinformatics, which involves the
collection, organization and analysis of large amounts of biological data
through computer networks. However for many, progress is impeded by the
computational infrastructure and network bandwidths. This project addresses the
problem of low bandwidth and reliability through the introduction of 3rd
generation P2P protocols, which uses the computing power of the entire network
and allows file transfers to continue in the case of a disconnection. P2P thus
promises to help facilitate the distribution and synchronization of biological
databases across the Asia-Pacific region.
Bioinformatics and the need for
network bandwidth
Bioinformatics involves the
collection, organization and analysis of large amounts of biological data, using
networks of computers and databases. Bioinformatics Centers around the world
have to regularly update their database repositories with the latest releases.
This is normally done by a file transfer over FTP; but the large and growing
sizes of these databases means that a large network bandwidth is required to
ensure the new database releases are downloaded quickly and without failure. To
assist this, a network of database mirror sites was established in several
countries worldwide in 1997, under the Bio-Mirror project. Developing countries
in the Asia-Pacific region are just moving into this new field of
bioinformatics, but the computational infrastructure
and network bandwidths available in those countries are still at a primitive
level compared to that in more developed countries. Network bandwidth within
these countries are still very low, and the low reliability of connections
means breaks / aborts in downloads are common. So, in spite of the Bio-Mirrors
nodes being made available, many developing countries in the world still face a
major problem in regularly updating these databases. And, with the large and
growing sizes of these databases, the problem will only get worse in the next
years because the growth of databases outstrips the rate of bandwidth
penetration to the end user.
A revolution in file sharing
technology
In the late 90’s, the Internet
community witnessed the start of a major revolution in the way people share
files – Peer-to-Peer (P2P) file exchange was introduced with the wildly popular
Napster in 1997. Internet users used this to share mp3 music and video files
throughout the world. P2P technology involves exchanging files not just between
a central server and multiple clients that connect to it, but rather focus on
using clients to exchange files amongst one another.
The technology continued to evolve and improve, with the second generation P2P FastTrack / Kazaa network in
2001. In 2002, the BitTorrent protocol was
introduced. This third generation P2P technology was a major advance over
previous P2P protocols with BitTorrent, a large file
to be distributed will be broken up into smaller fragments, typically around a
quarter of a megabyte each. These fragments are distributed to each peer, and
amongst peers, in a random manner, and are reassembled at the requesting
machine.
This difference between traditional client/server distribution of files, and
3rd generation P2P distribution, is illustrated in Figures 1 and 2 below:
These figures illustrate the power of the concept introduced by 3rd generation
P2P technology:
As the number of downloading clients
in the traditional distribution architecture increases, demands for bandwidth
placed on the server will only increase and lead to a bottleneck.
However, for the case of the 3rd
generation P2P architecture, the more peers there are, the more nodes are available
to distribute fragments of the file. High demand will actually lead to greater
throughput as more bandwidth from additional nodes becomes available to the
group.
Using P2P technology in distributing biological data
From the comparison above, it can be seen that if 3rd generation P2P technology
is used, it offers to simultaneously solve the two
major problems plaguing the distribution of biological data to developing
countries:
• Low international bandwidth
• Unreliable connections
P2P technology can be applied in
three areas – the distribution of biological software, courseware, and
databases.
Azureus program is selected for this project because it is a open-source, famous, and easy to use P2P program. The
project construction consists of four parts as follows: 1. Azureus
installing, 2. Azureus setting, 3. RSSFeed Scanner Plugin setting,
and 4. Advanced Statistics Plugin setting.
1. Azureus installing
1) Install security program which
are firewall, antivirus, and intrusion detection system for protect your
server.
2) Install azureus
program which download from http://azureus.sourceforge.net/download.php.
If your operating system is Linux, you should read http://azureus.sourceforge.net/howto_linux.php
. But if your operating system is Windows, you should read http://azureus.sourceforge.net/howto_win.php
2. Azureus setting
Azureus setting has 2 sections which are client and server setting.
1) Client setting for download data from the seeder
1.1) Go to the Tools menu and choose
Options. In the list on the left click Connection. Pick a number between 49152
and 65534, and enter in to the incoming TCP listen port and UDP listen port
boxes as shows in Figure 3. Then click Save to save this change. Moreover you
should open that port in your firewall for download or upload quickly.
Figure 3. Incoming TCP listen and UDP listen
ports
1.2) For test download data from a seeder, download a
torrent from a tracker such as KOBIC tracker (http://ftp.kobic.re.kr:6969/)
as shows in Figure 4. In the File menu click Open and choose Add file for add
that torrent which you have downloaded as shows in Figure 5. Figure 6 shows PSU
node is downloading go_200608-assocdb.rdf-xml.gz file from KOBIC node with
go_200608-assocdb.rdf-xml.gz.torrent, and the download speed is 12.4 kB/s.
Note if your Health on the torrent
is red, means that you’re not connected to any peer while downloading maybe
because tracker server down or no seeder.
Figure 4. KOBIC tracker
Figure 5. Torrent opening
Figure 6. PSU node is downloading data from KOBIC node.
2) Server setting for upload data to the peer
If you want to upload or distribute
your data to any peer, you must create a torrent for that data, and keep the
torrent in tracker.
2.1) Go to the Tools menu and choose
Options. In the list on the left click Tracker, then click Server. Enter
external IP address or server name. Select HTTP port check box, and enter a
port such as 6969 as shows in Figure 7., so you must
open 6969 port on your firewall.
2.2) In the
list on the left click Plugins, and next click
Tracker Web. Then select Publish torrent, enter title of you tracker web as
shows in Figure 8, and select all RSS feed options for automatic
synchronization as shows in Figure 9.
Figure 7. Tracker server setting
Figure 8. Tracker web setting
Figure 9. RSS feed setting in Tracker web
2.3) Create a torrent by click New Torrent in the File
menu and select Embedded Tracker as shows in Figure 10. Before finish creating
the torrent, select Open the torrent and Host the torrent as shows in Figure
11.
Figure 10. Create a new torrent
Figure 11. Before finish create a new torrent
2.4) On the tracker URL which will be something like http://yourexternalIPAddress:6969/
as shows in Figure 12.
Figure 12. PSU node tracker server URL
Moreover you can read more guide from Azureus User Guide -http://azureus.sourceforge.net/doc/Azureus%20User%20Guide.htm
3. RSSFeed Scanner Plugin
setting for automatic synchronization
1) Download RSSFeed Scanner plugin
from http://azureus.sourceforge.net/plugin_list.php
and select 1.3.1 version.
2) Install this plugin
by unzip rssfeed_1.3.1.zip and place it to “plugins”
folder of azureus program path. In addition if you
install azureus program to C:\Azureus, then you must
place “rssfeed_1.3.1” folder to C:\Azureus\plugins. After that, restart azureus program.
3) Click open RSSFeed
Scanner plugin in the Plugins
menu and select RSSFeed, then select Options tab.
3.1) Create RSSFeed URL for KOBIC node by click “+” label and set
options as shows in Figure 13. If
you enter 1800 in Delay time box, means that rss_KOBIC
is refreshed or downloaded the torrent automatically every 1,800 seconds.
Figure 13. RSS Feed URL setting
3.2) Create Filters for KOBIC node by click “+” label and set
options as shows in Figure 14.
Figure 14. Filters setting
3.3) When you click Status tab, you can see some
torrents from KOBIC node are downloaded automatically as shows in Figure 15 and
Figure 16.
Figure 15. Status of RSSFeed Scanner
Figure 16. The Torrent files in Azureus
program after run RSSFeed Scanner
3.4) You can read more detail in Help tab.
4. Advanced Statistics Plugin setting for test
performance.
1) Download Advanced Statistics plugin from http://azureus.sourceforge.net/plugin_list.php.
2) How to install this plugin it same to install RSSFeed
Scanner plugin.
3) Go to the Plugins
menu and choose Advanced Statistics. When select in the Progress tab, you can
see percentage of data that you can download or upload in each time period as
show in Figure 17. You can see download or upload speed in the Activity tab as
shows in Figure 18.
Figure 17. Progress of each torrent
Figure 18. Activity
KOBIC (
The experimental result
Let KOBIC node is seeder and PSU node
is peer. The Azureus and gFTP
program are tested with 2 torrents. Here is a sample result at 4:30 PM - 9:30
AM (17 hours).
downloaded => Azureus = 4.4 GB, gFTP = 3.0 GB
the average download speed => Azureus
= 82.0 kB/s, gFTP = 55.9 kB/s