March 16, 2016April 5, 2016 cloudcomputinggatech

Week 6:Programming frameworks and software

Topics

MapReduce
Dryad
Hadoop
Pig Latin

References

Workshop: MapReduce

Description

Implement well-known applications on top of MapReduce (sorting,counting, etc).

Specification

Using Azure HDInsight, you are going to implement the following applications using the Python MapReduce Azure tool.

The applications to be implemented are enlisted next.

Word Counter

Implement the mapper and reducer to obtain the histogram of the words in a file.

Grep

Implement the mapper and reducer to obtain the number of times the words “gatech” and “burdell” appears in a file.https://www.gatechdining.com/images/Spring%20Break%202016_tcm251-103643.pdf

Sort

Implement the mapper and reducer to sort a list of numbers in a file. Each line of the file has a different number.

Reverse Web-Link Graph

For a given list of webpages, the system should obtain a list of webpages that point to a given webpage, e.g., the final output is <target, list(sources)>, where a target is the url to which a hyperlink points and sources is where that hyperlink was found. Implement the mapper and reducer required to execute this task.

Additionally for the previous application, you would first test it using the default configurations, then you would create a better sharding algorithm based on the application, you can iteratively test until improved performance is obtained.

Homework

Some questions that would help the student familiarize with the MapReduce paper to prepare for the project, example of possible questions:

What are the main functionalities that the master has to support in order to distribute the work on the workers?
What happen when there are slow machines in the system?
More to come when the system implementation is defined.

Intention: clarify on the first-week questions that are going to arise when programming the assignment.

March 16, 2016March 30, 2016 cloudcomputinggatech

Week 5

Help student with issues in the assignment. The student would integrate all the parts implemented in the previous weeks.

The workshop is going to be mostly driven by the students.

Team Project: Network Virtualization

Description

One of the main benefits of using Mininet is that it emulate, which means running unmodified code interactively on virtual hardware on a regular PC, providing convenience and realism at low cost [cite], so this section of the project will give you a real perception of how to design a system top to bottom for a given application.

In this module of the project, you will design, implement, and thoroughly test a software defined network (SDN) both its topology and its rules.

Specification

The SDN will support a distributed server that is going to serve a variety of services, from regular HTTP web pages to real-time applications such as sensor streaming and video players. You need to design a topology and the required rules to fulfill the performance and budget requirements, with the given test cases scenarios.

Budget

You will have a max budget of [TBD].

The cost of the components is the same as the second module of the current project.

Performance

The average of ten different runs for 10 minutes should meet the following requirements:

The utilization of any router should be less than [TBD]
There can only be [TBD] packages drop on any router.
For the video streams, the QoS agreement can only be violated [TBD] times per user.
Queue Size

At any moment, the MAX needs to be in the following threshold.

Utilization < [TBD] %
Packages drop < [TBD]
Violations to QoS agreement < [TBD] violations/min

Once all the test were performed, write a 1-2 pages summary with a description of the topology and the rules implemented, include charts that show the performance and the results obtained from the tests. Describe what is the maximum amount of traffic that it can support and what are improvements that can be done, e.g. show the bottlenecks.

March 16, 2016March 30, 2016 cloudcomputinggatech

Week 4

Topics to be covered:

Performance evaluations (simulation, modelling, implementation).
Metrics for measuring performance (Network performance, latency, bandwidth, scalability, utilization).
Inter-data center networking.

References

Week 4: Simulations and measurements using real hardware

Due Date: fifth week (the same as workshop 3)

Description

Using the topologies defined in the previous section, you are going to map them to Microsoft Azure nodes and rerun the same tests again. Additional tests are going to be made.

Expected outcome

The student would learn about how mininet code can be easily map into real hardware, and how measurements can be done and retrieved from multiple nodes.

Specification

[TBD]

To be submitted to next class

[TBD]

March 16, 2016March 29, 2016 cloudcomputinggatech

Week 3

Guest Lecture on Azure Network Virtualization

Workshop 3: Rules defined with OpenFlow

Description

In this module of the project, you will design and deploy different types of rules using OpenFlow on the given mininet topology.

Expected outcome

The student would learn about SDN, openflow, how the rules are created and deployed, and the ryu controller.

Workshop

During the class workshop, the students are going to setup the ryu controller, and analyz how the setup phase work, how the measurements are done, and try the default rules.

Take Home work

They are going to analyze the results of applying the default rules and iteratively improve the flow, until there is not gain in the load of the system by using adaptive rules.

Specification

Using the provided topology implement two types rules, static and dynamic, on the switches using the provided python framework. Using the same tools from the previous section benchmark the utilization of the network, the number of packages drop and violations to the given QoS.

The scenarios to be covered are:

Multiple users requesting HTTP GET’s for web pages using a TCP/IP connection, the input connection should be randomly chosen.
Multiple users performing video streaming using a UDP connection, the input connection should be randomly chosen.
Multiple users requesting both types of contents, the input connection should be randomly chosen. Half of the users would use video streaming and half of the users would use regular web pages.

The average of ten different runs for 10 minutes with [TBD] number of users, should meet the following requirements: 1

The utilization of any router should be less than [TBD]
There can only be [TBD] packages drop on any router.
For the video streams, the QoS agreement can only be violated [TBD] times per user.

At any moment, the MAX needs to be in the following threshold.

Utilization < [TBD] %
Packages drop < [TBD]
Violations to QoS aggreament < [TBD] violations/min

Once all the test were performed, write a one-page summary with a description of the two rules implemented and charts that show the performance and the results obtained from the tests (yours and the ones that were given). An example of an interesting chart is a whiskers plot with the statistics calculated every ten seconds.

To be submitted next class

Show a demo in the class of the static rules.

The summary and the running tests are due the date presented at the beginning of this section, for both the static and dynamic rules.

March 16, 2016April 1, 2016 cloudcomputinggatech

Week 2

Topics to be covered:

Security
Physical data center network topology and redundancy
Network virtualization
Software-defined networks
Congestion control and traffic engineering

References

Others:

Toward Software-Defined Middlebox Networking

Workshop: Topology Implementation

Learn how to include the costs into the virtualization.

Description

In this module of the project, you will deploy an interconnection topology using mininet and analyze its performance, by deploying a predefined set of rules using OpenFlow

Expected outcome

The student would learn about network topologies, the basics of minninet, and how to do measurements on networks.

Specification

Using the framework given implement a three-level fat tree topology with mininet that does not cost more than the given budget. To be able to test the sysandtem deploy the included rules and characterize the system using the provided tests.

To test and characterize the system you are going to use the following tools: wireshark, iperf, dpctl, and cbench, you are going to test your topology and verify that the following aspects are met:

The system complies with the given requirements and budget.
Find the bottlenecks that may appear in the system by creating tests that force a worst-case scenario, at least, 3 different scenarios need to be covered.

Once all the test were performed, write a half a page summary with charts that shows the topology and the results obtained from the tests (yours and the ones given), with a description of the corner case covered.

Budget

You will have a max budget of [TBD]

The cost of the systems is as follows:

Each switch costs [TBD]
Cables have a variable cost depending on the max bandwidth that can support:
- 1 Gbps fiber optic cable costs [TBD]*
- 100 Mbps fiber optic cable costs [TBD]
- 10 Mbps copper cable costs [TBD]
All the previous costs include maintenance and deployment.

*We only use up to 1 Gbps connection due to limitations in mininet.

To be submitted next class

Show a demo in the class of the working topology.

The summary and the running tests are due the date presented at the beginning of this section.

March 16, 2016April 1, 2016 cloudcomputinggatech

Week 1: Introduction to Cloud Computing

Topics: Overview

What is cloud computing?
General Benefits and Architecture
Precursors
Business Drivers
Cloud Service Models (PaaS, IaaS, SaaS)
Main players in the Field
Overview of Security Issues and other limitations that need to be known.

References:

Workshop 1: Installing a web server.

Description

From the list that’s given in the following section, you are going to install the software required to make it run in a virtual machine on the cloud (Azure Virtual Machine)

Expected outcome

The student is going to get familiar with both Azure and Open Software server implementations.

Specification

Choose one application from the list and deploy it in an Azure Virtual Machine.

Video chat using WebRTC: link (nodejs, javascript)
Content Manager System: link (python)
Gif chat room: example and code (nodejs)
CrowdPokemon: example and
[More to be added]

To be presented in the next class

A Demo of the working server.

Additional Homework

To be presented next week

Prepare a two pages comparison that succinctly describe the different interconnection network topologies and explain what are the benefits and drawbacks of using them in datacenters and cloud environments.

In a one-page summary explain how OpenFlow works, this explanation should include at least the following areas:

Control Path and Data Path
How are the rules installed?
Most important flow table entries.
Forwarding logic.
Centralized vs. distributed control
Flow Routing vs. Aggregation
Reactive vs. Proactive rule creation

February 5, 2016February 5, 2016 cloudcomputinggatech

Cloud Computing Books

Cloud Computing: From Beginning to End
- Definition
- Cloud Technology: virtualization, Containers, load balancers,
- Cloud Security: availability, data protection, isolation, trust.
- Migration Methodology (how to migrate apps to the cloud).
Cloud Computing: Concepts, Technology & Architecture <-Interesting
- Fundamentals, mechanisms, architecture and metrics.
Distributed and Cloud Computing: From Parallel Processing to the Internet of Things
- Models and Enabling Technologies
- Clusters
- Virtualization
- Cloud and DataCenter architectures
- SOA for distributed computing.
- Cloud programming
- Grid, P2P and IoT
- Bad reviews!
Cloud Computing: A Hands-On Approach <- Interesting
- Characteristics, models and services examples
- Chapter 2 seems useful about cloud concepts and technologies (virtualization, load balancing, scalability, deployment replication, SDN, etc).
- Type of services and platforms.
- Multiple examples
- Benchmarking
- Security
Building Cloud Apps with Microsoft Azure
Cloud Computing: Automating the Virtualized Data Center
- Book from CISCO seems interesting but not sure how biased would it be.
Cloud Computing Design Patterns <-Interesting
- Sharing Scaling and Elasticity
- Reliability, resiliency, and recovery
- Data Management
- Virtualized environment and hypervisor
- Monitoring, provisioning, and administrative patterns.
Cloud Computing: Theory and Practice <- May be interesting
- Cloud Resource virtualization
- Management and Scheduling
- Network
- Storage
- Security
- Self-Organization
- Parallel and distributed algorithms.
Mastering Cloud Computing: Foundations and Applications Programming
- Foundations (Parallel and Distributed Computing, virtualization, Architecture)
- Framework for building cloud computing applications -> Aneka (.NET based)
- Concurrent, High-throughput, data intensive computing.
- advanced topics-> Federated clouds, market-based management, energy efficiency.
Cloud Computing Networking: Theory, Practice, and Development <- Interesting for networks
Cloud Architecture Patterns: Using Microsoft Azure
- Talks about patterns for programming distributed systems in the cloud.
Software Defined Networks: A Comprehensive Approach
- Useful description of SDN
- Many topics are covered.

February 5, 2016August 22, 2016 cloudcomputinggatech

Project 1

Software Defined Networks (SDN)

Project Description

This project has an individual component and one in teams of 2-3persons.

For this project, you will design, implement, and thoroughly test a software defined network (SDN) for a distributed server that is going to serve a variety of services, from regular HTTP web pages to real-time applications such as sensor streaming and video players. The network has to support protocols such as TCP/IP and UDP simultaneously.

The project would be done a part in the class workshops and a part outside of the class.

Grading

Your project grade will be based on the quality of your report, on the usefulness of the system you’ve built, on the extent to which your design is a good fit for the problem you’re solving, and the quality of the code submitted.

Specification

This project mainly consists of four modules:

Research – Individual
Implementation of a topology using mininet. – Individual
Implementation of static and dynamic rules using OpenFlow. – Individual
Simulation and measurements of the previous two steps using real hardware. -Group
Design, implement and test a network for a data center that fulfills the requirements given. – Group

Tools that may be handy

iperf, wireshark, tc, cbench, and ovs-ofctl

Week by Week

Expected Timeline

Additional notes

Your code should comply with the given headers and code structure, a significant part of your grade is going to be from automatic grading, if your code does not meet the requirements a zero would be given for the corresponding section. Tests will be provided for you to verify that your code has the required structure.

What to Hand In (Final submission)

A report with all the contents exposed in the previous sections, following the template from ACM SIG.
The code for the topology implementation.
The code for the OpenFlow implementation
The code for your final system.
Any test cases used to test your system with the corresponding Makefile.

All the files have to be submitted using the directory structure obtained from T-Square and compress to tar.gz format.

February 3, 2016February 5, 2016 cloudcomputinggatech

Topics that can be covered

Overview
- General Benefits and Architecture
- Precursors
- Business Drivers
- Cloud Service Models (PaaS, IaaS, SaaS)
- Main players in the Field
- Overview of Security Issues
Microsoft Azure
DataCenter Architecture
Resource Management
- Automated provisioning
- Balancing.
- Scheduling
Virtualization basics
- Virtualization, hypervisor
- VM management example
Network
- Physical data center network topology and fault-tolerance
- Network virtualization
- Software defined networking
- Congestion control and traffic engineering
- Inter-data center networks
Filesystems and Data Storage
- Distributed FileSystems
- (Dynamo,Haystack,BigTable)
- NoSQL
Security
Programming frameworks and software
- MapReduce
- Dryad
- Hadoop
- Pig Latin
- OpenStack
Sensor Networks and Stream Processing
Energy
Scalability, performance characterization and benchmarking

Additional interesting topics that may or not be covered

Operating Systems (Mesos, Akaros, Arrakis)
Distributed Algorithms
- PAXOS
- Overlay Networks
- P2P
- Consensus (Quorum)
- Coordination (ZooKeeper)
Multicast
RealTime cloud -> Reference
PubSub
Faults and failures
Iterative processing (pregel, spark, etc)+
Web Development and Cloud applications
- RestFUL API’s
Trending areas
Disaster and failure recovery.
Mobile Cloud computing.

February 3, 2016February 3, 2016 cloudcomputinggatech

Cloud Computing Classes from other Universities (and other entities)

Other references:

Paper about the implementation of the cloud computing class in CMU

Books:

Distributed and Cloud Computing, From Parallel Processing to the Internet of Things

Good references for other universities here

Good introductory referency by Berkeley

	Project 3 \| Cloud Co… on Week 14: Trending areas
	Project 3 \| Cloud Co… on Week 13: IoT and Stream P…
	Project 3 \| Cloud Co… on Week 12:
	Project 3 \| Cloud Co… on Week 11:

Topics

References

Workshop: MapReduce

Description

Specification

Word Counter

Reverse Web-Link Graph

Homework

Team Project: Network Virtualization

Specification

Budget

Performance

Topics to be covered:

References

Week 4: Simulations and measurements using real hardware

Description

Expected outcome

Specification

To be submitted to next class

Guest Lecture on Azure Network Virtualization

Workshop 3: Rules defined with OpenFlow

Description

Expected outcome

Workshop

Take Home work

Specification

To be submitted next class

Topics to be covered:

References

Workshop: Topology Implementation

Description

Expected outcome

Specification

To be submitted next class

Topics: Overview

References:

Workshop 1: Installing a web server.

Expected outcome

Specification

To be presented in the next class

Additional Homework

Software Defined Networks (SDN)

Project Description

Grading

Specification

Tools that may be handy

Week by Week

Expected Timeline

￼

Additional notes

What to Hand In (Final submission)