Microsoft and Cray announce partnership to speed up deep learning on supercomputers

Reading time icon 3 min. read

Readers help support Windows Report. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help Windows Report sustain the editorial team Read more

At the 2016 Neural Information Processing Systems (NIPS) Conference in Barcelona, Cray Inc. released details of a report on deep learning that was developed in partnership with Microsoft and the Swiss National Supercomputing Centre (CSCS), a reputable super computing institution. In the report, Cray supercomputers were using deep learning algorithms running at scale.

Microsoft and Cray found that there is a significant scientific advantage at running larger deep learning models, but that too much time is wasted creating training the deep learning models. Thus, Cray worked with Microsoft and CSCS to use the Microsoft Cognitive Toolkit (previously known as CNTK) running at scale on a Cray XC50 supercomputer at CSCS nicknamed “Piz Daint”.

Cray, Microsoft, and CSCS scaled Microsoft Cognitive Toolkit to more than 1,000 NVIDIA Tesla P100 GPU accelerators on “Piz Daint.” With the help of Microsoft Cognitive Toolkit, data scientists and researchers will be able to run bigger and more complicated deep learning projects at scale.

Director of the Swiss National Supercomputing Centre (CSCS), Dr. Thomas C. Schulthess explains how Cray contributed to their collaboration:

“Cray’s proficiency in performance analysis and profiling, combined with the unique architecture of the XC systems, allowed us to bring deep learning problems to our Piz Daint system and scale them in a way that nobody else has. What is most exciting is that our researchers and scientists will now be able to use our existing Cray XC supercomputer to take on a new class of deep learning problems that were previously infeasible.”

Instead of waiting weeks or months for deep learning models to complete the training process, Microsoft Cognitive Toolkit helped data scientists get results within hours or even minutes in some cases. By improving the speed of deep learning frameworks, there are more opportunities to solve solve supercomputing issues, including moving from image recognition to video recognition, or switching from simple speech recognition to natural language processing with context.

In addition to Microsoft Cognitive Toolkit, Microsoft, Cray, and CSCS used the Cray XC Aries network and a high performance MPI library, adding the ability to leverage more computing resources for each individual deep learning model, thus reducing the training time.

Microsoft AI and Research’s renowned Dr. Xuedong Huang added his take on the collaboration and Microsoft Cognitive Toolkit’s role:

“Applying a supercomputing approach to optimize deep learning workloads represents a powerful breakthrough for training and evaluating deep learning algorithms at scale. Our collaboration with Cray and CSCS has demonstrated how the Microsoft Cognitive Toolkit can be used to push the boundaries of deep learning.”

Cray’s director of deep learning and machine learning, Dr. Mark S. Staveley, remarked on how Microsoft helped Cray:

“We are working to unlock possibilities around new approaches and model sizes, turning the dreams and theories of scientists into something real that they can explore. Our collaboration with Microsoft and CSCS is a game changer for what can be accomplished using deep learning.”