In this course, you should have learned much about big data management. This assignment is to test
your acquired knowledge, critical thinking as well as knowledge self-discovery in big data management.
Some questions are of summary type with critical reflections nature; others are of self-discovery and
exploratory nature. Marking criteria include depth of acquired knowledge (30%), depth of reflective
analysis (30%), discovery ability in knowledge exploration (30%) and organization of writing (10%). You
have to answer all questions.
1) Developing Big Data Applications [20 marks]
a. Summarize and reflect what you have learned from this course about conceptual modeling in
big data management. Illustrate, with diagrams and explanation, the use of conceptual
modeling in your group project (do not share with your groupmates); (at least 600 words)
b. Search from external sources for more detail information about ECL which is used inside HPCC
environment. Describe ECL’s declarative programming constructs especially how its AI features
can solve BDA problems with task-parallelism nature; (at least 600 words)
2) Data & Task Parallelism [25 marks]
a. Summarize and reflect what you have learned from this course about data parallelism and task
parallelism; (at least 300 words)
b. Design a full example (case scenario requiring “composite-key; multiple-value” key-value pair,
e.g. {city + day, max temp + min temp}; 30 sets of raw data to be split into 2 data nodes;
MapReduce 5-stage solutions) illustrating your understanding of data parallelism. This example’s
solution format should be similar to class exercises 3 or 4 but the scenario and raw data must be
totally different;
3) Tools and techniques for big data [20 marks]
a. Using a table, compare and contrast between MPP and SMP in terms of their strengths,
weaknesses and application areas; (at least 500 words)
b. Find another big data platform (except HPCC and Spark) similar to Hadoop. Compare (using a
table) this platform with Hadoop in terms of features, strengths and weaknesses. (at least 500
words)
4) Data governance for big data management [15 marks]
a. Summarize and reflect what you have learned from this course about the five key concepts for
big data oversight; (at least 600 words)
b. Illustrate your understanding of dimensions for measuring the quality of information used for
BDA with one example for each dimension; (at least 600 words)
5) NoSQL Database [20 marks]
a. Using a table, compare and contrast (in terms of structure, advantages, disadvantages, and
application areas) the four types of NoSQL database which you have learned in this course; (at
least 800 words)
b. Find one workable column-oriented NoSQL database product in the IT vendors’ market, and
describe it including features and price. Critically analyze why and how this NoSQL database can
be used inside the case of “What does Big Data have to do with an owl?” (at least 600 words)