[[PageOutline]]

= Application Name =

Reverse Index

= Summary =

* '''Name''': {{{Reverse Index}}}
* '''Contact Person''': {{{support-compss@bsc.es}}}
* '''Access Level''': {{{public}}}
* '''License Agreement''': {{{GPL}}}
* '''Platform''': {{{COMPSs}}}
* '''Repository''': [[https://compss.bsc.es/svn/bar/apps/python/reverseIndex|Reverse Index]]


== Description ==
Given a directory, this application parses all the files in it and writes all the links found in a result output file.
Files are distributed in a given number of chunks. Chunks of files are processed in parallel. 
Later, once processed, chunks are merge to a final result file. Merging tasks are done also in parallel.
In the result file, after each link appears the filename of the files that contains that link. 

Arguments:
	1. Debug: if true, prints debug information
	2. Website path: path to the directory where to read the files from
	3. Chunks: number of chunks when processing files
	4. Output filename: filename for the result file where the application merges all the links found
	5. Temp directory: directory where the application writes the (*.part) temporary files


== Execution instructions ==
The test directory under this project contains 3 html pages to be parsed as example.
{{{
export CLASSPATH=$CLASSPATH:/YOUR_PATH_TO/reverseindex.jar
export CLASSPATH=$CLASSPATH:/YOUR_PATH_TO/htmlparser.jar
runcompssext --app=reverse.Reverse --project=/YOUR_PATH_TO/project.xml --resources=/YOUR_PATH_TO/resources.xml --cline_args="true /YOUR_PATH_TO/test 3 /YOUR_PATH_TO/results.txt /YOUR_PATH_TO/tmp"
}}}


== Dependencies ==

For compilation and/or execution there are some jars found in the lib directory of this project that could be needed:

* activation.jar
* commons-compress-1.4.1.jar
* filterbuilder.jar
* htmllexer.jar
* htmlparser.jar
* sitecapturer.jar
* tar.jar
* thumbelina.jar