Parallel Frequent Itemset Mining on the Intel MIC Accelerators

Authors

  • Mikhail L. Zymbler Author

Abstract

Association rule mining is one of the basic problems of data mining, which supposes finding strong correlations between itemsets in large transaction database. Association rules are generated from frequent itemsets (itemset is frequent if its items frequent occur together in transactions). The DIC (Dynamic Itemset Counting) algorithm is modification of the classical Apriori algorithm of finding frequent itemsets. DIC tries to reduce the number of passes made over the transaction database while keeping the number of itemsets counted in a pass relatively low. The paper addresses the task of accelerating DIC on the Intel MIC (Many Integrated Core) systems in the case when the transaction database fits into the main memory. The paper presents a parallel implementation of DIC based on OpenMP technology and thread-level parallelism. We exploit the bit-based internal layout for transactions and itemsets. This technique simplifies the support count via logical bitwise operation, and allows for vectorization of such a step. Experiments with large synthetic and real databases showed good performance and scalability of the proposed algorithm.

Author Biography

  • Mikhail L. Zymbler
    нач. отдела интеллектуального анализа данных и виртуализации ЛСМ ЮУрГУ

Published

2019-03-04

Issue

Section

Informatics, Computers and Control