MLlib: RDD-based API This page documents sections of the MLlib guide for the RDD-based API (the spark.mllib package). Please see the MLlib Main Guide for the DataFrame-based API (the spark.ml package), which is now the primary API for MLlib. Data types Basic statistics summary statistics correlations stratified sampling hypothesis testing streaming significance testing random data generation Class