Operating Systems for Parallel and Distributed Architectures

Homework #1

Homework 1: Install a virtualized cluster containing a head node and at least two compute nodes in a virtual environment such as Oracle Virtual Box. The virtual cluster should run ROCKS Cluster Distribution 6.2. The number of compute nodes will depend on the available memory of your physical host. Deadline: Week 10 (December 8, 2022).


Information regarding the discipline

Name of the discipline: Operating Systems for Parallel and Distributed Architectures
Course coordinator: Lect. Dr. Darius Bufnea

Prerequisites

Curriculum: Operating Systems, Distributed Operating Systems, Computer Networks
Competencies: Average administration and programming skills

Objectives of the discipline

General objective of the discipline: Know the key concepts of parallel cluster architectures
Specific objective of the discipline: At the end of the course, students will know how to build, deploy, configure, maintain, monitor, debug a Linux parallel cluster.

Content

  1. Introduction to Operating systems for parallel architectures
  2. Parallel Cluster architecture: Cluster Head Nodes, Computer Nodes, Clustering Middleware
  3. Parallel Cluster Paradigms: Single system image, Centralized system management, High processing capacity, Resource consolidation, Optimal use of resources, High-availability, Redundancy, Single points of failure, Failover protection and disaster recovery, Horizontal and vertical scalability, Load-balancing, Elasticity, Run jobs anytime, anywhere
  4. Design and configuration. Network prerequisites for a parallel cluster: LAN, bandwidth, latency, interface, security aspects. Nodes automatic configuration and deployment
  5. Virtualization of hardware, operating system, storage devices, computer network resources
  6. Beowulf clusters deployment and administrations
  7. Linux Cluster Distributions: Mosix, ClusterKnoppix. Automated operating systems and software provisioning for a Linux Cluster: Open Source Cluster Application Resources (OSCAR)
  8. Cluster resources: distributed memory architecture and distributed shared memory, distributed file systems (examples: IBM General Parallel File System, Microsoft’s Cluster Shared Volumes, Oracle Cluster File System
  9. Nodes and head node management, Cluster system management, Debugging and monitoring a parallel cluster, Node failure management
  10. Data sharing and communication, Message passing and communication, Parallel processing libraries: Parallel Virtual Machine toolkit and the Message Passing Interface library
  11. Software and development environment, Parallel application development and execution (Parallel Environment – PE), Job scheduling & management

Bibliography

  1. Gregory Pfister: In Search of Clusters, Prentice Hall; 2nd edition (December 22, 1997), ISBN-10: 0138997098, ISBN-13: 978-0138997090;
  2. George F. Coulouris, Jean Dollimore, Tim Kindberg: Distributed Systems: Concepts and Design, Addison-Wesley; 5th edition (May 7, 2011), ISBN-10: 0132143011, ISBN-13: 978-0132143011;
  3. Joseph D. Sloan: High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI, O’Reilly Media (November 23, 2004), ISBN-10: 0596005709, ISBN-13: 978-0596005702;
  4. Daniel F. Savarese, Donald J. Becker, John Salmon, Thomas Sterling: How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters, The MIT Press (May 28, 1999), ISBN-10: 026269218X, ISBN-13: 978-0262692182;
  5. Gordon Bell, Thomas Sterling: Beowulf Cluster Computing with Linux, The MIT Press; 1st edition (October 1, 2001), ISBN-10: 0262692740, ISBN-13: 978-0262692748;
  6. Charles Bookman: Linux Clustering: Building and Maintaining Linux Clusters, Sams Publishing; 1st edition (June 29, 2002), ISBN-10: 1578702747, ISBN-13: 978-1578702749.

Evaluation (pandemic times specific requirements)

Type of activity Evaluation criteria Evaluation methods Share in the grade (%)
Course Know the key theoretical concepts of parallel cluster architectures Written exam (will take place online, student will have to have their camera switched on) 30%
Seminar/lab activities Know how to deploy, maintain, debug and monitor a parallel cluster Homework assignments 30%
Presentation on clustering related topics 30%
Default 10%