Application Name
Reverse Index
Summary
- Name: Reverse Index
- Contact Person: support-compss@bsc.es
- Access Level: public
- License Agreement: GPL
- Platform: COMPSs
- Repository: Reverse Index
Description
Given a directory, this application parses all the files in it and writes all the links found in a result output file. Files are distributed in a given number of chunks. Chunks of files are processed in parallel. Later, once processed, chunks are merge to a final result file. Merging tasks are done also in parallel. In the result file, after each link appears the filename of the files that contains that link.
Arguments:
- Debug: if true, prints debug information
- Website path: path to the directory where to read the files from
- Chunks: number of chunks when processing files
- Output filename: filename for the result file where the application merges all the links found
- Temp directory: directory where the application writes the (*.part) temporary files
Execution instructions
The test directory under this project contains 3 html pages to be parsed as example.
export CLASSPATH=$CLASSPATH:/YOUR_PATH_TO/reverseindex.jar export CLASSPATH=$CLASSPATH:/YOUR_PATH_TO/htmlparser.jar runcompssext --app=reverse.Reverse --project=/YOUR_PATH_TO/project.xml --resources=/YOUR_PATH_TO/resources.xml --cline_args="true /YOUR_PATH_TO/test 3 /YOUR_PATH_TO/results.txt /YOUR_PATH_TO/tmp"
Dependencies
For compilation and/or execution there are some jars found in the lib directory of this project that could be needed:
- activation.jar
- commons-compress-1.4.1.jar
- filterbuilder.jar
- htmllexer.jar
- htmlparser.jar
- sitecapturer.jar
- tar.jar
- thumbelina.jar
Attachments (1)
- reverseIndex.tar.gz (622.5 KB) - added by trac 9 years ago.
Download all attachments as: .zip