Difference between revisions of "Deep Learning Workflow"

From Earlham CS Department
Jump to navigation Jump to search
m (Dkvart17 moved page DeepLearningWorkflow to Deep Learning Workflow without leaving a redirect)
(Updated Machine learning wiki RAHHH)
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
==Where to store data?==
+
==Deep Learning Workflow (2025)==
For your project to be successful, it is critical that you have the ability to iterate over the training process in a timely manner. One thing you should make sure of is that reading the data is not costly.
+
''Your up-to-date guide for running deep learning assignments on the **Faraday** cluster.''
  
Your directory is a very bad place to store your data for a simple reason: user directories are not stored on the local disk of every machine, they are mounted via NFS over the network. Transferring data over the network is much, much slower than reading it from the disk (keep in mind that you also have to first read data from the disk, in order to send it over the network).
+
---
  
In order to avoid this pitfall, simply put the data on the local disk. /mounts is a standard directory we’ve been putting data. If you don’t have access to this directory, just shoot an email to sysadmins and we will help you out.
+
===Contents===
 +
1. [[#Where to Start (JupyterHub)|Where to Start (JupyterHub)]] 
 +
2. [[#What Python Version to Use|What Python Version to Use]] 
 +
3. [[#Using GPUs for Training|Using GPUs for Training]] 
 +
4. [[#Using Pretrained Models with Keras|Using Pretrained Models with Keras]] 
  
==GPU==
 
Deep learning has been around for almost half a century, but only recently it has become a prevalent machine learning method. The popularity of deep learning has risen dramatically since the 2000s and it was due to two main factors:
 
*Mass digitization provided computer scientists with a ton of data that is necessary to train models that generalize well.
 
*The computation power has become cheaper and more accessible. Especially with the arrival of GPUs, training the models has become faster than ever. You too have to leverage these two factors to make sure your project is successful.
 
  
===GPU vs CPU===
+
---
The majority of the work done by neural networks is just matrix multiplication. This operation is highly parallelizable and GPUs are designed for parallel processing of the data. Let's conduct an experiment to demonstrate how much faster GPUs are when it comes to training neural networks. I will train a simple CNN on digit classification.
+
 
 +
==Where to Start (JupyterHub)==
 +
All machine learning assignments should now be run on **Faraday** using its dedicated JupyterHub instance: 
 +
https://jupyter.cluster.earlham.edu/hub/login
 +
 
 +
Log in with your Earlham credentials. This environment:
 +
- Has preloaded TensorFlow, PyTorch
 +
- Gives direct access to Faraday's GPU nodes
 +
- Supports running long experiments and interactive debugging
 +
 
 +
To confirm GPU access in a notebook:
 
<pre>
 
<pre>
from tensorflow.keras.datasets import mnist
 
import os
 
os.environ["CUDA_VISIBLE_DEVICES"] = "-1" #if commented, runs on GPU, otherwise CPU
 
 
import tensorflow as tf
 
import tensorflow as tf
from tensorflow.keras import Sequential, datasets, layers, models
+
print(tf.config.list_physical_devices('GPU'))
train_X, train_y), (test_X, test_y) = mnist.load_data()
+
</pre>
height = train_X.shape[1]
+
 
width = train_X.shape[2]
+
---
num_classes = 10
+
 
model = Sequential([
+
==What Python Version to Use==
  layers.experimental.preprocessing.Rescaling(1./255, input_shape=(height, width,1)),
+
Faraday’s default deep learning environment uses **Python 3.12**, which is fully compatible with the latest machine learning frameworks.
  layers.Conv2D(32, 3, padding='same', activation='relu'),
 
  layers.MaxPooling2D(),
 
  layers.Conv2D(64, 3, padding='same', activation='relu'),
 
  layers.MaxPooling2D(),
 
  layers.Conv2D(128, 3, padding='same', activation='relu'),
 
  layers.MaxPooling2D(),
 
  layers.Flatten(),
 
  layers.Dense(256, activation='relu'),
 
  layers.Dense(num_classes)
 
])
 
model.compile(optimizer='adam',
 
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
 
              metrics=['accuracy'])
 
epochs=10
 
import time
 
start = time.time()
 
history = model.fit(
 
  train_X,
 
  train_y,
 
  epochs=epochs,
 
  batch_size=64
 
)
 
end=time.time()
 
print("Time taken:",end-start)
 
  
</pre>
 
  
'''Results'''
+
Supported with:
 +
- **TensorFlow**
 +
- **PyTorch**
  
CPU with 16 cores: 188.98 s
+
To list all available modules:
 +
<pre>$ module avail</pre>
  
GPU: 54.0 s
+
Compatibility references:
 +
- TensorFlow: https://www.tensorflow.org/install/source#gpu 
 +
- PyTorch: https://pytorch.org/get-started/previous-versions/
  
As you can see I achieved 3.5x acceleration using GPU, even though I was using all 16 CPU cores.
+
---
  
 +
==Using GPUs for Training== 
 +
Training on GPUs is essential for modern deep learning tasks. Faraday is equipped with high-performance NVIDIA GPUs hat massively accelerate training.
  
===Setting up environment for GPU===
+
To verify GPU access:
Layout is a good place to do machine learning projects. We will be adding GPUs to other machines soon so this info might change.
+
<pre>
One way to check the specs of gpu is to run the command :
+
import tensorflow as tf
:<pre> $ lshw -C display </pre>
+
print(tf.config.list_physical_devices('GPU'))
 +
</pre>
  
On layout this outputs:
+
---
  
:description: VGA compatible controller
+
==Using Pretrained Models with Keras== 
:product: GK110 [GeForce GTX 780]
+
If you’re using **MobileNetV2**, **InceptionV3**, **VGG19**, or **VGG16**, you can easily load them through Keras:
:vendor: NVIDIA Corporation
 
:physical id: 0
 
:bus info: pci@0000:03:00.0
 
:version: a1
 
:width: 64 bits
 
:clock: 33MHz
 
:capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
 
:configuration: driver=nvidia latency=0
 
:resources: irq:181 memory:de000000-deffffff memory:d0000000-d7ffffff memory:d8000000-d9ffffff
 
:ioport:8000(size=128) memory:df000000-df07ffff
 
  
 +
<pre>
 +
from keras.applications import MobileNetV2, InceptionV3, VGG19, VGG16
 +
</pre>
  
Depending on the model and the make of GPU other commands might also be available. For example, for nvidia gpus you can display the info about the current state of GPU using the command:
+
Each model can be initialized with pretrained weights (e.g., from ImageNet):
<pre> $ nvidia-smi</pre>  
+
<pre>
(Note: you can use this command to see whether or not the resources are busy).
+
base_model = MobileNetV2(
 +
    weights='imagenet',
 +
    include_top=False,
 +
    input_shape=(224, 224, 3)
 +
)
 +
</pre>
  
The way we manage different software versions and environments is through Modules. You can display the available modules via command:
+
These models run efficiently on Faraday's GPUs when used with `tensorflow` and `keras`.
module avail
+
Ideal for transfer learning, feature extraction, or fine-tuning workflows.
If you run this command on layout you will see different versions of python, conda and cuda modules. Different python versions have different tensorflow versions installed.
 
  
You can see tensorflow’s compatibility chart with python, cuda and cudnn here: https://www.tensorflow.org/install/source#gpu
+
To freeze layers during transfer learning:
 +
<pre>
 +
for layer in base_model.layers:
 +
    layer.trainable = False
 +
</pre>
  
The latest version of tensorflow (2.3.1) is available on python/3.7 and it’s compatible with cuda/10.1.
+
You can then add your own classifier on top using `tf.keras.Sequential` or the functional API.
  
If you are planning to use this version of tensorflow, then simply run these two commands:
+
---
<pre>$module load python/3.7
 
$module load cuda/10.1</pre>
 
  
==Jupyter==
 
  
It’s very convenient to run DL/ML projects on python notebooks for multiple reasons. It’s easier to visualize the data, you can make changes on the fly, it helps you to make sure everything is set up correctly before you start running lengthy experiments, etc.
 
  
We have set up a jupyterhub instance on layout that is designed to support DL projects. (Link: https://lo0.cluster.earlham.edu/jupyterhub) It comes prepackaged with cuda10.1, tf2.3.1, py3.7 -> it automatically runs the tensorflow projects on the available GPUs. Just to make sure everything’s set up correctly, use this python script:
+
''Last updated: May 2025''
<pre>tf.config.list_physical_devices('GPU')</pre>
 

Latest revision as of 09:43, 5 May 2025

Deep Learning Workflow (2025)

Your up-to-date guide for running deep learning assignments on the **Faraday** cluster.

---

Contents

1. Where to Start (JupyterHub) 2. What Python Version to Use 3. Using GPUs for Training 4. Using Pretrained Models with Keras


---

Where to Start (JupyterHub)

All machine learning assignments should now be run on **Faraday** using its dedicated JupyterHub instance: https://jupyter.cluster.earlham.edu/hub/login

Log in with your Earlham credentials. This environment: - Has preloaded TensorFlow, PyTorch - Gives direct access to Faraday's GPU nodes - Supports running long experiments and interactive debugging

To confirm GPU access in a notebook:

import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))

---

What Python Version to Use

Faraday’s default deep learning environment uses **Python 3.12**, which is fully compatible with the latest machine learning frameworks.


Supported with: - **TensorFlow** - **PyTorch**

To list all available modules:

$ module avail

Compatibility references: - TensorFlow: https://www.tensorflow.org/install/source#gpu - PyTorch: https://pytorch.org/get-started/previous-versions/

---

Using GPUs for Training

Training on GPUs is essential for modern deep learning tasks. Faraday is equipped with high-performance NVIDIA GPUs hat massively accelerate training.

To verify GPU access:

import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))

---

Using Pretrained Models with Keras

If you’re using **MobileNetV2**, **InceptionV3**, **VGG19**, or **VGG16**, you can easily load them through Keras:

from keras.applications import MobileNetV2, InceptionV3, VGG19, VGG16

Each model can be initialized with pretrained weights (e.g., from ImageNet):

base_model = MobileNetV2(
    weights='imagenet',
    include_top=False,
    input_shape=(224, 224, 3)
)

These models run efficiently on Faraday's GPUs when used with `tensorflow` and `keras`. Ideal for transfer learning, feature extraction, or fine-tuning workflows.

To freeze layers during transfer learning:

for layer in base_model.layers:
    layer.trainable = False

You can then add your own classifier on top using `tf.keras.Sequential` or the functional API.

---


Last updated: May 2025