wiki:application_pages/apps/java/kmeans_frag

Application Name

K-means Fragments

Summary

  • Name: K-means Fragments
  • Contact Person: support-compss@bsc.es
  • Access Level: public
  • License Agreement: GPL
  • Platform: COMPSs
  • Repository: K-means Fragments

Description

K-means clustering is a method of cluster analysis that aims to partition n points into k clusters in which each point belongs to the cluster with the nearest mean. It follows an iterative refinement strategy to find the centers of natural clusters in the data.

When executed with COMPSs, K-means first generates the input points by means of initialization tasks. For parallelism purposes, the points are split in a number of fragments received as parameter, each fragment being created by an initialization task and filled with random points.

After the initialization, the algorithm goes through a set of iterations. In every iteration, a computation task is created for each fragment; then, there is a reduction phase where the results of each computation are accumulated two at a time by merge tasks; finally, at the end of the iteration the main program post-processes the merged result, generating the current clusters that will be used in the next iteration. Consequently, if F is the total number of fragments, K-means generates F computation tasks and F-1 merge tasks per iteration.

Versions

Version 1: Binary Serialization

The parameters are serialized using binary serialization. All the codes of this part are packaged under the kmeans_frag/binarySerialization/ folder.

Version 2: XML Serialization

The parameters are serialized using XML serialization (by having getters and setters). All the codes of this part are packaged under the kmeans_frag/XMLSerialization/ folder.

Version 3: Sequential Merge

The reduce task is not declaring causing a serialization in this part of the application. All the codes of this part are packaged under the kmeans_frag/sequentialMerge/ folder.

Version 4: Params OUT

Task parameters are declared as OUT parameters instead of return values. All the codes of this part are packaged under the kmeans_frag/paramsOUT/ folder.

Execution instructions

Usage: runcompss kmeans_frag.binarySerialization.KMeans_frag -c <numClusters> -i <numIterations> -n <numPoints> -d <numDimensions> -f <numFragments> -p <pathToDataset>

runcompss kmeans_frag.XMLSerialization.KMeans_frag -c <numClusters> -i <numIterations> -n <numPoints> -d <numDimensions> -f <numFragments> -p <pathToDataset>

runcompss kmeans_frag.sequentialMerge.KMeans_frag -c <numClusters> -i <numIterations> -n <numPoints> -d <numDimensions> -f <numFragments> -p <pathToDataset>

runcompss kmeans_frag.paramsOUT.KMeans_frag -c <numClusters> -i <numIterations> -n <numPoints> -d <numDimensions> -f <numFragments> -p <pathToDataset>

Execution Example

runcompss kmeans_frag.binarySerialization.KMeans -c 100 -i 10 -n 9984000 -d 1000 -f 512 -p /gpfs/projects/bsc19/COMPSs_APPS/kmeans/data/fragments/dataset_10M_100C_1000D_512F_plain

Build

Option 1: Native java

cd ~/workspace_java/kmeans_frag/; javac src/main/java/kmeans_frag/*/*.java cd src/main/java/; jar cf kmeans_frag.jar kmeans_frag/ cd ../../../; mv src/main/java/kmeans_frag.jar jar/

Option 2: Maven

cd ~/workspace_java/kmeans_frag/ mvn clean package

Last modified 9 years ago Last modified on 10/26/15 09:33:41

Attachments (1)

Download all attachments as: .zip