Optimizing Labs 2 Palabras Architecture for Effective Data Processing

1 / 12

Embed Share

"Learn the step-by-step process to set up and run the Palabras Architecture for efficient data processing. From login to running locally and preparing the master and slaves, follow the guidelines provided to maximize performance. With detailed instructions and visual aids, this guide ensures a smooth implementation of the Palabras system for your data needs."

nashley Follow

Uploaded on Jun 16, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Labs 2: Palabras

Palabras Archiecture Slave1 Slave2 Slave3 SlaveN Master1 Master2 Master3 MasterN Directory Jobs3 Jobs1 Jobs2 JobsM Slave1 Slave2 Slave3 SlaveM

Step 1: Get Started Login: Username: nombre\cc5212 Password on board http://aidanhogan.com/teaching/cc5212-1/mdp-lab2.zip C:/Program Files (x86)/eclipse/ (in Spanish ) File > Import > http://aidanhogan.com/teaching/cc5212-1/mdp-lab2-data/

Step 2: Run Locally ~600.000 abstracts ~52.340.000 non-unique words ~320 MB uncompressed How long will it take? Will it even run? org.mdp.cli.RunWordCountLocally Right Click > Run As > Run Configurations > Arguments -i <path>/abstracts-es.txt.gz -igz k 500 -Xmx256M

Step 3: Start the Directory I start the directory! vm116.dcc.uchile.cl (172.17.69.190) Port 1985 Remind me to set heap-space

Step 4: Prepare Slave org.mdp.cli.StartWordCountSlave 1. Implement openDirectoryStub() 2. Add the slave s name to the directory 3. Review the other code

Step 5: Run Slave Build the .jar using build.xml(dist) Open cmd and go to directory java jar Xmx256M mdp-2.jar StartWordCountSlave dn vm116.dcc.uchile.cl dp 1985 sn <username>

Step 6: Prepare Master org.mdp.cli.StartWordCountMaster 1. Connect to the directory 2. Get the list of slaves from the directory 3. Clear words from the slave for you 4. Choose a slave for each word 5. Send the add-words job to each slave

Step 7: Run Master For small dataset! org.mdp.cli.StartWordCountMaster Right Click > Run As > Run Configurations > Arguments -i <path>\es-abstracts-10k.txt.gz -igz -dp 1985 -dn vm116.dcc.uchile.cl -mn <username> - k 500

Step 8: Run Big Master For big dataset! org.mdp.cli.StartWordCountMaster Right Click > Run As > Run Configurations > Arguments -i <path>\es-abstracts.txt.gz -igz -dp 1985 -dn vm116.dcc.uchile.cl -mn <username> -k 500

Step 9: Run Distribution Locally 1. Start a directory server Build and use the jar java -jar mdp-2.jar StartRegistryAndServer -n localhost -p 1985 -r -s 1 -sp 2. Start 4 slaves (give different names) in four different CMD windows Use the jar java -jar mdp-2.jar StartSlave -dn localhost - dp 1985 wn <usernameN> 3. Start a master Can use Eclipse or jar (as preferred) Point it to local directory Use small file (large file if successful) -Xmx256M