Changes between Initial Version and Version 1 of application_pages/apps/python/reverseIndex


Ignore:
Timestamp:
10/26/15 09:33:51 (9 years ago)
Author:
trac
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • application_pages/apps/python/reverseIndex

    v1 v1  
     1[[PageOutline]] 
     2 
     3= Application Name = 
     4 
     5Reverse Index 
     6 
     7= Summary = 
     8 
     9* '''Name''': {{{Reverse Index}}} 
     10* '''Contact Person''': {{{support-compss@bsc.es}}} 
     11* '''Access Level''': {{{public}}} 
     12* '''License Agreement''': {{{GPL}}} 
     13* '''Platform''': {{{COMPSs}}} 
     14* '''Repository''': [[https://compss.bsc.es/svn/bar/apps/python/reverseIndex|Reverse Index]] 
     15 
     16 
     17== Description == 
     18Given a directory, this application parses all the files in it and writes all the links found in a result output file. 
     19Files are distributed in a given number of chunks. Chunks of files are processed in parallel.  
     20Later, once processed, chunks are merge to a final result file. Merging tasks are done also in parallel. 
     21In the result file, after each link appears the filename of the files that contains that link.  
     22 
     23Arguments: 
     24        1. Debug: if true, prints debug information 
     25        2. Website path: path to the directory where to read the files from 
     26        3. Chunks: number of chunks when processing files 
     27        4. Output filename: filename for the result file where the application merges all the links found 
     28        5. Temp directory: directory where the application writes the (*.part) temporary files 
     29 
     30 
     31== Execution instructions == 
     32The test directory under this project contains 3 html pages to be parsed as example. 
     33{{{ 
     34export CLASSPATH=$CLASSPATH:/YOUR_PATH_TO/reverseindex.jar 
     35export CLASSPATH=$CLASSPATH:/YOUR_PATH_TO/htmlparser.jar 
     36runcompssext --app=reverse.Reverse --project=/YOUR_PATH_TO/project.xml --resources=/YOUR_PATH_TO/resources.xml --cline_args="true /YOUR_PATH_TO/test 3 /YOUR_PATH_TO/results.txt /YOUR_PATH_TO/tmp" 
     37}}} 
     38 
     39 
     40== Dependencies == 
     41 
     42For compilation and/or execution there are some jars found in the lib directory of this project that could be needed: 
     43 
     44* activation.jar 
     45* commons-compress-1.4.1.jar 
     46* filterbuilder.jar 
     47* htmllexer.jar 
     48* htmlparser.jar 
     49* sitecapturer.jar 
     50* tar.jar 
     51* thumbelina.jar