[[PageOutline]] = Application Name = Reverse Index = Summary = * '''Name''': {{{Reverse Index}}} * '''Contact Person''': {{{support-compss@bsc.es}}} * '''Access Level''': {{{public}}} * '''License Agreement''': {{{GPL}}} * '''Platform''': {{{COMPSs}}} * '''Repository''': [[https://compss.bsc.es/svn/bar/apps/python/reverseIndex|Reverse Index]] == Description == Given a directory, this application parses all the files in it and writes all the links found in a result output file. Files are distributed in a given number of chunks. Chunks of files are processed in parallel. Later, once processed, chunks are merge to a final result file. Merging tasks are done also in parallel. In the result file, after each link appears the filename of the files that contains that link. Arguments: 1. Debug: if true, prints debug information 2. Website path: path to the directory where to read the files from 3. Chunks: number of chunks when processing files 4. Output filename: filename for the result file where the application merges all the links found 5. Temp directory: directory where the application writes the (*.part) temporary files == Execution instructions == The test directory under this project contains 3 html pages to be parsed as example. {{{ export CLASSPATH=$CLASSPATH:/YOUR_PATH_TO/reverseindex.jar export CLASSPATH=$CLASSPATH:/YOUR_PATH_TO/htmlparser.jar runcompssext --app=reverse.Reverse --project=/YOUR_PATH_TO/project.xml --resources=/YOUR_PATH_TO/resources.xml --cline_args="true /YOUR_PATH_TO/test 3 /YOUR_PATH_TO/results.txt /YOUR_PATH_TO/tmp" }}} == Dependencies == For compilation and/or execution there are some jars found in the lib directory of this project that could be needed: * activation.jar * commons-compress-1.4.1.jar * filterbuilder.jar * htmllexer.jar * htmlparser.jar * sitecapturer.jar * tar.jar * thumbelina.jar