Bringing Deep Learning Workloads to JSC supercomputers course

November 19, 2024 · *NOT*

strube1$ module spider PyTorch
------------------------------------------------------------------------------------
  PyTorch:
------------------------------------------------------------------------------------
    Description:
      Tensors and Dynamic neural networks in Python with strong GPU acceleration. 
      PyTorch is a deep learning framework that puts Python first.

     Versions:
        PyTorch/1.7.0-Python-3.8.5
        PyTorch/1.8.1-Python-3.8.5
        PyTorch/1.11-CUDA-11.5
        PyTorch/1.12.0-CUDA-11.7
     Other possible modules matches:
        PyTorch-Geometric  PyTorch-Lightning
...

Time	Title
10:00 - 10:10	Welcome
10:10 - 10:40	Introduction
10:40 - 11:00	Jupyter-JSC
11:00 - 11:10	Coffee Break
11:10 - 11:30	SLURM
11:30 - 12:00	Setup Environement
12:00 - 12:10	Coffee Break
12:10 - 12:40	Distributed Data Parallel
12:40 - 13:00	Model Parallelism and Analysis

Bringing Deep Learning Workloads to JSC supercomputers course

Communication:

Goals for this course:

Team:

Schedule

Note

Jülich Supercomputers

What is a supercomputer?

Anatomy of a supercomputer

JURECA DC Compute Nodes

How do I use a Supercomputer?

You don’t use the whole supercomputer

You submit jobs to a queue asking for resources

You don’t use the whole supercomputer

And get results back

You don’t use the whole supercomputer

You are just submitting jobs via the login node

You don’t use the whole supercomputer

You are just submitting jobs via the login node

You don’t use the whole supercomputer

You are just submitting jobs via the login node

You don’t use the whole supercomputer

You don’t use the whole supercomputer

And get results back

Supercomputer Usage Model

Recap:

Recap:

Connecting to Jureca DC

Getting compute time

Jupyter-jsc

Jupyter-jsc

Working with the supercomputer’s software

Luncher in Jupyter-JSC

Software

Connect to terminal

Tool for finding software: module spider

What do we have?

Module hierarchy

What do I need to load such software?

Example: PyTorch

Example: PyTorch

Example: PyTorch

Python Modules

Some of the python softwares are part of Python itself, or of other softwares. Use “module key”

How to run it on the login node

create a python file

create a python file

create an python file

create a python file

create a python file

Run code in login node

But that’s not what we want… 😒

So we send it to the queue!

HOW?🤔

SLURM 🤯

Slurm submission file

Slurm submission file example

Submitting a job: SBATCH

Are we there yet?

Are we there yet? 🐴

ST is status:

Reservations

Job is wrong, need to cancel

Check logs

By now you should have output and error log files on your directory. Check them!

Setup project path

Extra software, modules and kernels

You want that extra software from pip….

Example: Let’s install some software!

Example: Let’s install some software!

Example: Activating the virtual environment

Example: Activating the virtual environment

Let’s train a 🐈 classifier!

Submission file for the classifier

Submit it

Submission time

Probably not much happening…

💥

What happened?

🤔…

Tool for finding software: `module spider`

Some of the python softwares are part of Python itself, or of other softwares. Use “`module key`”

You want that extra software from `pip`….