Sunday, 4 March 2018

Beowulf Cluster - The beginning of something big


Beowulf Cluster - The beginning of something big 

A Beowulf cluster is essentially a system consisting of at least 2 PCs working together to perform a given task. Each PC is known as a node with one PC set up to be the server node and the rest as client nodes. The server node controls the whole cluster and serves files to the client nodes. The client nodes are effectively “dumb nodes” and are given a IP address from the server node and can be operated remotely via the server node. The server node is the gateway node to the outside world and is the node that the user would be interfacing with [7]. The operating system that will be used for the cluster is Ubuntu which is a Linux distribution. There are two main types of software packages used to achieve the Beowulf cluster, PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) libraries. The main differences between both libraries are:
  • Process Control: PVM can start and stop tasks, to find out which tasks are running, and possibly where they are running. MPI is much more static, It contains functions to start a group of tasks and to send a kill signal to a group of tasks
  • Resource Control: PVM is dynamic and can utilize nodes when required. This allows load balancing, task migration and fault tolerance. MPI is static and each node is utilized simultaneously. All potential communication paths are known at start-up. The advantage of this is higher performance but lacks in load balancing, task migration and fault tolerance.
  • Message Passing operations: PVM provides very simple message passing capabilities. MPI has a much wider range of message passing capabilities MPI-2 has 248 functions for message-passing operations
  • Fault Tolerance: In PVM If one node fails, the entire system adapts, and the application continues without the failed node. In MPI If one node fails, the entire application fails due to MPIs static nature. [8]

For the application in this project, MPI is the better of the two options due to its higher processing power due to its superior message passing capabilities. The application will not take advantage of PVMs many benefits. Due to the repeated nature of the application there is no need for task migration. Fault tolerance would be beneficial, but this is targeted more at high end parallel processing systems with hundreds of nodes running week/month long simulations. The probability of one of the 8 nodes failing is low but in the case that it does the simulation times for the application won’t be nearly long as the high-end systems causing little disruption.

All the components are now in including the switch, 8 PCs and all the peripherals. And I  have just started wiping them and installing Ubuntu.


Figure 1. Pile of Junk

No comments:

Post a Comment