Week 6:Programming frameworks and software

Topics

  • MapReduce
  • Dryad
  • Hadoop
  • Pig Latin

References

Workshop: MapReduce

Description

Implement well-known applications on top of MapReduce (sorting,counting, etc).

Specification

Using Azure HDInsight, you are going to implement the following applications using the Python MapReduce Azure tool.

The applications to be implemented are enlisted next.

Word Counter

Implement the mapper and reducer to obtain the histogram of the words in a file.

Grep

Implement the mapper and reducer to obtain the number of times the words “gatech” and “burdell” appears in a file.https://www.gatechdining.com/images/Spring%20Break%202016_tcm251-103643.pdf

Sort

Implement the mapper and reducer to sort a list of numbers in a file. Each line of the file has a different number.

Reverse Web-Link Graph

For a given list of webpages, the system should obtain a list of webpages that point to a given webpage, e.g., the final output is <target, list(sources)>, where a target is the url to which a hyperlink points and sources is where that hyperlink was found. Implement the mapper and reducer required to execute this task.

Additionally for the previous application, you would first test it using the default configurations, then you would create a better sharding algorithm based on the application, you can iteratively test until improved performance is obtained.

Homework

Some questions that would help the student familiarize with the MapReduce paper to prepare for the project, example of possible questions:

  • What are the main functionalities that the master has to support in order to distribute the work on the workers?
  • What happen when there are slow machines in the system?
  • More to come when the system implementation is defined.

Intention: clarify on the first-week questions that are going to arise when programming the assignment.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s