- Virtualization, hypervisor
- VM management example
- Functional Debugging in distributed systems
References for the workshop
Workshop and assignment
Design and implement the MapReduce master in Azure. Develop the base code for the Master implementation. Create the handlers, interfaces, and scoreboard required for the Master.
Intention: familiarize the user with the IaaS services provided by Azure, setup the environment to develop the project coding section. Familiarize the student with the library and how we remotely start process in distributed systems.
Using Azure Linux Virtual Machines you are going to implement the Master node on the MapReduce runtime.
First, you need to create a pool of resources, using either the resource manager or the CLI. One of the virtual machines is going to run the master code and the other are going to run the workers code (explained afterward).
Second, create a Virtual Network, that is going to connect all the Virtual Machines in your system. Then install the required libraries into the virtual machines in the available pool of resources.static
Implement the required data structures for the Master, as exposed in the Map Reduce paper:
The master keeps several data structures. For each map task and reduce task, it stores the state (idle, in-progress, or completed), and the identity of the worker machine (for non-idle tasks). The master is the conduit through which the location of intermediate file regions is propagated from map tasks to reduce tasks. Therefore, for each completed map task, the master stores the locations and sizes of the R intermediate file regions produced by the map task. Updates to this location and size information are received as map tasks are completed. The information is pushed incrementally to workers that have in-progress reduce tasks.
Jeffrey Dean and Sanjay Ghemawat, MapReduce (2004)
Once the data structures and handlers are implemented, exercise an empty handle in a worker, that would write “Hello gatech” into a log file and respond back to the master, using RPC (remote procedure call). For this workshop this would be the implementation of the worker.
- Each key is store in a blob and with a value per key