Training deep learning models for NLP tasks typically requires many hours or days to complete on a single GPU. In this post, we leverage Determined’s distributed training capability to reduce BERT for SQuAD model training from hours to minutes, without sacrificing model accuracy. In this 2-part blog series, we outline tips and tricks to accelerate NLP deep learning model training across multiple G