TensowflowでFP16をいい感じに使って自動で高速化してくれるAutomatic Mixed Precision

NVIDIA RTXシリーズのGPUはTensor Coreを搭載しており、FP16で計算することでFP32よりも計算量を落とし、ディープラーニングにかかる時間を短くしてくれる。

ただし、FP16にしてもいいレイヤーとFP32の方がいいレイヤーが混在しており、調べるのが面倒くさい。

Automatic Mixed Precision（自動混合精度機能）という機能を使えばFP16とFP32を勝手に使い分けてくれ、数行のコードを追加するだけで自動で高速化してくれる。

Using Automatic Mixed Precision for Major Deep Learning Frameworks
TensorFlow
Automatic Mixed Precision is available both in native TensorFlow and inside the TensorFlow container on NVIDIA NGC container registry. To enable AMP in NGC TensorFlow 19.07 or upstream TensorFlow 1.14 or later, wrap your tf.train or tf.keras.optimizers Optimizer as follows:
opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)
This change applies automatic loss scaling to your model and enables automatic casting to half precision.
https://developer.nvidia.com/automatic-mixed-precision

NVIDIA NGC containerを使う必要があるらしい。

docker pull nvcr.io/nvidia/tensorflow:20.06-tf2-py3

ソースコードに環境変数をセット。

os.environ[‘TF_ENABLE_AUTO_MIXED_PRECISION’] = ‘1’

オプティマイザーをラップする。

# Graph-based example:
opt = tf.train.AdamOptimizer()
opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)
train_op = opt.miminize(loss)

# Keras-based example:
opt = tf.keras.optimizers.Adam()
opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)
model.compile(loss=loss, optimizer=opt)
model.fit(...)

docker container run させ、学習を実行させると以下のようなログが出てくることを発見。

2020-07-14 06:26:30.309851: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1035] Automatic Mixed Precision Grappler Pass Summary:

Total processable nodes: 24493
Recognized nodes available for conversion: 15070
Total nodes converted: 2320
Total FP16 Cast ops used (excluding Const and Variable casts): 476
Whitelisted nodes converted: 652
Blacklisted nodes blocking conversion: 292
Nodes blocked from conversion by blacklisted nodes: 272

自分のコードを走らせてみると、画像分類タスクにおいて1エポックあたり大体30%高速化し、バッチサイズを2倍にすることが出来た。

しかもAccuracyやAUCは自動混合精度機能を使わない場合と遜色が無く、ほぼデメリットなしで高速化出来るおいしい機能だと分かった。

参考

Automatic Mixed Precision for Deep Learning

TensorFlow | NVIDIA NGC

TensorFlow is an open source platform for machine learning. It provides comprehensive tools and libraries in a flexible ...

TensorFlow User Guide - NVIDIA Docs

TensorFlow is an open-source software library for numerical computation using data flow graphs. Nodes in the graph repre...

Automatic Mixed Precision in TensorFlow for Faster AI Training on NVIDIA GPUs

A guest post by NVIDIA