K-mers are fundamental building block for many NGS applications. However, k-mers are error prone, posing great challenges for downstream data analyses. We propose a statistical approach to effectively distinguish solid kmers from weak k-mers. Precisely, we calculate a z-score for each k-mer, and jointly determine whether it is really solid based on its z-score and frequency. Experiments show that our approach effectively pinpoints out solid kmers having low frequency, achieving an average improvement of 11.25%.
Share this article