Test an MPI program of your choice and benchmark it on your cluster using a different number of compute nodes. Run it both in manual mode and through the job scheduler available on your Rocks cluster instance. Deadline: Week 12 (December 21st).
Homework 1: Install a virtualized cluster containing a head node and at least two compute nodes in a virtual environment such as Oracle Virtual Box. The virtual cluster should run ROCKS Cluster Distribution 6.2. The number of compute nodes will depend on the available memory of your physical host. Deadline: Week 10 (December 7, 2023).
Presentations on a cluster-related topic
|Week 13||05/01/2024||6 slots|
|Week 14||12/01/2024||12 slots|
Presentations schedule (click to see the available slots). In order to reserve a slot for your presentation write me an e-mail or leave me a message on Microsoft Teams. The deadline for choosing a cluster-related topic for your presentation and reserve a slot is December 7th.
Information regarding the discipline
Name of the discipline: Operating Systems for Parallel and Distributed Architectures
Course coordinator: Assoc. prof. Darius Bufnea, darius.bufnea at ubbcluj punct ro
Curriculum: Operating Systems, Distributed Operating Systems, Computer Networks
Competencies: Average administration and programming skills
Objectives of the discipline
General objective of the discipline: Know the key concepts of parallel cluster architectures
Specific objective of the discipline: At the end of the course, students will know how to build, deploy, configure, maintain, monitor, debug a Linux parallel cluster.
- Introduction to Operating systems for parallel architectures
- Parallel Cluster architecture: Cluster Head Nodes, Computer Nodes, Clustering Middleware
- Parallel Cluster Paradigms: Single system image, Centralized system management, High processing capacity, Resource consolidation, Optimal use of resources, High-availability, Redundancy, Single points of failure, Failover protection and disaster recovery, Horizontal and vertical scalability, Load-balancing, Elasticity, Run jobs anytime, anywhere
- Design and configuration. Network prerequisites for a parallel cluster: LAN, bandwidth, latency, interface, security aspects. Nodes automatic configuration and deployment
- Virtualization of hardware, operating system, storage devices, computer network resources
- Beowulf clusters deployment and administrations
- Linux Cluster Distributions: Mosix, ClusterKnoppix. Automated operating systems and software provisioning for a Linux Cluster: Open Source Cluster Application Resources (OSCAR)
- Cluster resources: distributed memory architecture and distributed shared memory, distributed file systems (examples: IBM General Parallel File System, Microsoft’s Cluster Shared Volumes, Oracle Cluster File System
- Nodes and head node management, Cluster system management, Debugging and monitoring a parallel cluster, Node failure management
- Data sharing and communication, Message passing and communication, Parallel processing libraries: Parallel Virtual Machine toolkit and the Message Passing Interface library
- Software and development environment, Parallel application development and execution (Parallel Environment – PE), Job scheduling & management
- Gregory Pfister: In Search of Clusters, Prentice Hall; 2nd edition (December 22, 1997), ISBN-10: 0138997098, ISBN-13: 978-0138997090;
- George F. Coulouris, Jean Dollimore, Tim Kindberg: Distributed Systems: Concepts and Design, Addison-Wesley; 5th edition (May 7, 2011), ISBN-10: 0132143011, ISBN-13: 978-0132143011;
- Joseph D. Sloan: High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI, O’Reilly Media (November 23, 2004), ISBN-10: 0596005709, ISBN-13: 978-0596005702;
- Daniel F. Savarese, Donald J. Becker, John Salmon, Thomas Sterling: How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters, The MIT Press (May 28, 1999), ISBN-10: 026269218X, ISBN-13: 978-0262692182;
- Gordon Bell, Thomas Sterling: Beowulf Cluster Computing with Linux, The MIT Press; 1st edition (October 1, 2001), ISBN-10: 0262692740, ISBN-13: 978-0262692748;
- Charles Bookman: Linux Clustering: Building and Maintaining Linux Clusters, Sams Publishing; 1st edition (June 29, 2002), ISBN-10: 1578702747, ISBN-13: 978-1578702749.
|Type of activity||Evaluation criteria||Evaluation methods||Share in the grade (%)|
|Course||Know the key theoretical concepts of parallel cluster architectures||Written exam (will take place online, student will have to have their camera switched on)||30%|
|Seminar/lab activities||Know how to deploy, maintain, debug and monitor a parallel cluster||Homework assignments||30%|
|Presentation on clustering related topics||30%|