About the talk
DLRM (Deep Learning Recommendation Model) is a deep learning-based model for recommendations introduced and open sourced by Facebook. It’s one of the State-Of-The-Art models and part of the MLPerf training benchmark. DLRM workload poses unique challenges for single-socket and multi-socket distributed training due to the need to balance a mixture of compute-bound, memory-bound and I/O-bound operations. To tackle this, we implemented an efficient scale-out solution for DLRM training on Intel Xeon clusters that includes innovative data and model parallelization, new hybrid splitSGD + LAMB optimizers, efficient hyperparameter tuning for model convergence with much larger global batch size, and novel data loader techniques to support scale-up and scale-out. According to the MLPerf v1.0 training result, we can train DLRM with 64 Xeon Cooper-Lake 8376H processors in 15 minutes, a 3X improvement compared with our MLPerf v0.7 submission with 16 Xeon Cooper-Lake 8380 processors. In this talk, Ke will discuss DLRM, the unique challenges associated with it and these optimizations that drive training performance acceleration.
Ke has 16 years’ working experience in machine learning and platform SW development at Intel. Currently he is Principal AI Engineer and Engineering Director at Machine Learning Performance group under Intel Software and Advanced Technology Group, responsible for applied machine learning end-2-end workload development, framework optimization and new AI technology exploration for the latest Intel Xeon CPU and upcoming discrete GPU platforms. He has 20+ granted patents in the domain of machine learning, multimedia and context awareness.View the profile
Buy this talk
Buy this video
Our other topics
With ConferenceCast.tv, you get access to our library of the world's best conference talks.