| 1 | [[PageOutline]] |
| 2 | |
| 3 | = Application Name = |
| 4 | |
| 5 | Reverse Index |
| 6 | |
| 7 | = Summary = |
| 8 | |
| 9 | * '''Name''': {{{Reverse Index}}} |
| 10 | * '''Contact Person''': {{{support-compss@bsc.es}}} |
| 11 | * '''Access Level''': {{{public}}} |
| 12 | * '''License Agreement''': {{{GPL}}} |
| 13 | * '''Platform''': {{{COMPSs}}} |
| 14 | * '''Repository''': [[https://compss.bsc.es/svn/bar/apps/python/reverseIndex|Reverse Index]] |
| 15 | |
| 16 | |
| 17 | == Description == |
| 18 | Given a directory, this application parses all the files in it and writes all the links found in a result output file. |
| 19 | Files are distributed in a given number of chunks. Chunks of files are processed in parallel. |
| 20 | Later, once processed, chunks are merge to a final result file. Merging tasks are done also in parallel. |
| 21 | In the result file, after each link appears the filename of the files that contains that link. |
| 22 | |
| 23 | Arguments: |
| 24 | 1. Debug: if true, prints debug information |
| 25 | 2. Website path: path to the directory where to read the files from |
| 26 | 3. Chunks: number of chunks when processing files |
| 27 | 4. Output filename: filename for the result file where the application merges all the links found |
| 28 | 5. Temp directory: directory where the application writes the (*.part) temporary files |
| 29 | |
| 30 | |
| 31 | == Execution instructions == |
| 32 | The test directory under this project contains 3 html pages to be parsed as example. |
| 33 | {{{ |
| 34 | export CLASSPATH=$CLASSPATH:/YOUR_PATH_TO/reverseindex.jar |
| 35 | export CLASSPATH=$CLASSPATH:/YOUR_PATH_TO/htmlparser.jar |
| 36 | runcompssext --app=reverse.Reverse --project=/YOUR_PATH_TO/project.xml --resources=/YOUR_PATH_TO/resources.xml --cline_args="true /YOUR_PATH_TO/test 3 /YOUR_PATH_TO/results.txt /YOUR_PATH_TO/tmp" |
| 37 | }}} |
| 38 | |
| 39 | |
| 40 | == Dependencies == |
| 41 | |
| 42 | For compilation and/or execution there are some jars found in the lib directory of this project that could be needed: |
| 43 | |
| 44 | * activation.jar |
| 45 | * commons-compress-1.4.1.jar |
| 46 | * filterbuilder.jar |
| 47 | * htmllexer.jar |
| 48 | * htmlparser.jar |
| 49 | * sitecapturer.jar |
| 50 | * tar.jar |
| 51 | * thumbelina.jar |