Information regarding the discipline
Name of the discipline:
Operating Systems for Parallel and Distributed Architectures (High Performance Computing and Big Data Analytics Master’s programme)
Operating Systems and Computer Architecture (Artificial Intelligence for Connected Industries Master’s programme)
Course coordinator: Assoc. prof. Darius Bufnea, darius.bufnea at ubbcluj punct ro
Prerequisites
Curriculum: Operating Systems, Distributed Operating Systems, Computer Networks
Competencies: Average administration and programming skills
Objectives of the discipline
General objective of the discipline: Know the key concepts of parallel cluster architectures
Specific objective of the discipline: At the end of the course, students will know how to build, deploy, configure, maintain, monitor, debug a Linux parallel cluster.
Content
- Introduction to Operating systems for parallel architectures
- Parallel Cluster architecture: Cluster Head Nodes, Computer Nodes, Clustering Middleware
- Parallel Cluster Paradigms: Single system image, Centralized system management, High processing capacity, Resource consolidation, Optimal use of resources, High-availability, Redundancy, Single points of failure, Failover protection and disaster recovery, Horizontal and vertical scalability, Load-balancing, Elasticity, Run jobs anytime, anywhere
- Design and configuration. Network prerequisites for a parallel cluster: LAN, bandwidth, latency, interface, security aspects. Nodes automatic configuration and deployment
- Virtualization of hardware, operating system, storage devices, computer network resources
- Beowulf clusters deployment and administrations
- Linux Cluster Distributions: Mosix, ClusterKnoppix. Automated operating systems and software provisioning for a Linux Cluster: Open Source Cluster Application Resources (OSCAR)
- Cluster resources: distributed memory architecture and distributed shared memory, distributed file systems (examples: IBM General Parallel File System, Microsoft’s Cluster Shared Volumes, Oracle Cluster File System
- Nodes and head node management, Cluster system management, Debugging and monitoring a parallel cluster, Node failure management
- Data sharing and communication, Message passing and communication, Parallel processing libraries: Parallel Virtual Machine toolkit and the Message Passing Interface library
- Software and development environment, Parallel application development and execution (Parallel Environment – PE), Job scheduling & management
Bibliography
- Gregory Pfister: In Search of Clusters, Prentice Hall; 2nd edition (December 22, 1997), ISBN-10: 0138997098, ISBN-13: 978-0138997090;
- George F. Coulouris, Jean Dollimore, Tim Kindberg: Distributed Systems: Concepts and Design, Addison-Wesley; 5th edition (May 7, 2011), ISBN-10: 0132143011, ISBN-13: 978-0132143011;
- Joseph D. Sloan: High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI, O’Reilly Media (November 23, 2004), ISBN-10: 0596005709, ISBN-13: 978-0596005702;
- Daniel F. Savarese, Donald J. Becker, John Salmon, Thomas Sterling: How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters, The MIT Press (May 28, 1999), ISBN-10: 026269218X, ISBN-13: 978-0262692182;
- Gordon Bell, Thomas Sterling: Beowulf Cluster Computing with Linux, The MIT Press; 1st edition (October 1, 2001), ISBN-10: 0262692740, ISBN-13: 978-0262692748;
- Charles Bookman: Linux Clustering: Building and Maintaining Linux Clusters, Sams Publishing; 1st edition (June 29, 2002), ISBN-10: 1578702747, ISBN-13: 978-1578702749.
Evaluation
Type of activity | Evaluation criteria | Evaluation methods | Share in the grade (%) |
---|---|---|---|
Course | Know the key theoretical concepts of parallel cluster architectures | Written exam | 30% |
Seminar/lab activities | Know how to deploy, maintain, debug and monitor a parallel cluster | Homework assignments | 30% |
Presentation on clustering related topics | 30% | ||
Default | 10% |