P4DS: Assignment 3 (Autumn 2021)

Data Analysis Project

Notebook template design: Brandon Bennett

Original: 2020.11.3
Revised: 2021.03.02, 2021.11.10

Project Title

Give names and emails of group members here:

Project Requirements

PLEASE DELETE THIS WHOLE CELL BEFORE SUBMITTING YOUR PROJECT

The purpose of this assignment is to develop your skills in organising and presenting a Data Science project.

Since, most of the marks will be awarded for organisation and presentation, it is suggested that you do not initially attempt anything too complicated. However, once you have managed to get a basic pipeline working that fits the guidelines, you are encouraged to extend and elaborate your analysis.

Your project should entirely contained within this template file. You should keep the basic structure indicated below. To facilitate grading according to the marking scheme.

You may import any module that is provided with Anaconda3 Python.

Marking Scheme

The marking scheme is as follows:

Data Resources

You can use any data you like. Many useful resources are available. As a starting point you could look at the following:

Using this Notebook Template

Please use this notebook as a template for your project file. In the following cells of the notebook, italic text giving explanations and examples should be either deleted, or, in most cases, replaced by appropriate text describing your project. Text that is not in italic (which is mostly headings) should be left as it is. Your project report notebook should the same overall structure as this templete notebook. An exception to this is the current markup cell describing the project requiements. You should delete this before submitting your notebook.

Project Plan

The Data (10 marks)

Here you should describe the data indluding details of: where it comes from, what data it contains, is it accurate. (Potentially you could create data from a simulation but you should still explain why and how you indend to generate this data.)

It can be just one dataset or several that can be combined somehow.

(Potentially you could create your own data data from a simulation algorithm but you should still explain why and how you indend to generate this data.)

There are 10 marks for this, so a fairly detailed description of the data is expected (around 300-400 words)

Project Aim and Objectives (5 marks)

Here you should describe the general aim of your project in around 200-300 words.

This can can be anything from classifying items according to their characteristic features (which mushrooms are poisonous?) to simulating an evolving process (will the rabbits eat all the carrots or get eaten by the foxes?)

Here some ideas of general types of processing functionality that you could implement:

Specific Objective(s)

You should chose and list up to 3 specific objectives suited to the data you will be working with and the type of project you wish to carry out. There should be at least one per person doing the project. There is no need for the objectives them to be completely different. They could be different stages of the processing requirements, or different processing functions that the system provides. Or just different aspects of data analysis that will be conducted. Typically, it is expected that there would be one objective per person, but you may do more. Replace the following examples with your own objectives:

System Design (5 marks)

Describe your code in terms of the following two sections.

Architecture

Typically this would be a pipeline in which data goes through several stages of transformation and analysis, but other architectures are possible. This does not need to be particularly complicated. A simple diagram with 100-150 words of explanation would be a good way to present your architecture.

Processing Modules and Algorithms

Briefly list and describe the most significant computational components of your system and the algorithms you will use to implement them. This could include things like:

Your list can be presented in similar form to the one just given, but should include a brief but more specific description of the components and/or algorithms. Probably three or four components is sufficient for most projects, but you may want to have more.

Program Code (15 marks)

Your code should be divided into relatively short cells, with brief explanation in markup cells between.

As noted in the assigment overview, it is not necessary that your coding be super complex in order to get a good mark. Although there is a mark for the coding achievement, it is only a quarter of the total.

The suggested length of the code is about 200 lines for a 1 person project or 300 for a 2 or 3 person project. You should not use more than 500 lines of code.

You should divide the code in accordance with the specification of modules and/or algoriths you gave in the previous section. Complex modules should be further divided into several code cells.

Please note the following about your code:

Brief Explanation of following code cell

Below is a silly example of some trivial data. Replace this markup cell and the one below with somthing more interesting. And go on adding more until you have achieved your objectives (at least to some extent).

Comment on previous cell output (optional)

As well as describing code, it will in many cases be informative to describe the output that has been generated by a cell.

The previous output cell shows a key number in our fridge magnet analysis.

Brief Explanation of following code cell

Since fridge magnets often take the form of cute animals, we use pandas to convert the raw data into a DataFrame.

Comment on previous cell output (optional)

The output from the previous cell is very interesting.

The following cell defines a visualisation function for the data.

More code cells

You can add as many code cells as you require, but it is recommended that you break code into relatively small chunks and do not exceed the maximum number of lines stated above.

Project Outcome (10 + 10 marks)

This section should describe the outcome of the project by means of both explanation of the results and by graphical visualisation in the form of graphs, charts or or other kinds of diagram

The section should begin with a general overview of the results and then have a section for each of the project objectives. For each of these objectives an explanation of more specific results relating to that objective shoud be given, followed by a section presenting some visualisation of the results obtained. (In the case where the project had just one objective, you should still have a section describing the results from a general perspective followed by a section that focuses on the particular objective.)

The marks for this section will be divided into 10 marks for Explanation and 10 marks for Visualisation. These marks will be awarded for the Project Outcome section as a whole, not for each objective individually. Hence, you do not have to pay equal attention to each. However, you are expected to have a some explanation and visualisation for each. It is suggested you have 200-400 words explanation for each objective.

Overview of Results

Give a general overview of the results (around 200 words).

Objective 1

Explanation of Results

200-400 words

Visualisation

The following bar chart gives a vivid representation of the distribution of fridge magnet types, in which the dominance of 'meme' type magnets is dramatically illustrated.

Objective 2 (if present)

Explanation of Results

200-400 Words

Visualisation

Objective 3 (if present)

Explanation of Results

200-400 Words

Visualisation

Conclusion (5 marks)

_Your concluding section should be around 200-400 words. It is recommended that you divide it into the following sections.__

Achievements

As we had expected, the most popular fridge magnets were of the 'meme' kind. We were surprised that 'smiley' fridge magnets were less common than expected. We conjecture that this is because, although they are apparently very popular, few fridges display more than one smiley. However, 'meme' based magnets can be found in large numbers, even on quite small fridges.

Limitations

The project was limited to a small number of fridge magents, which may not be typical of fridges found in the global fridge magnet ecosystem.

Future Work

In future work we would like to obtain more diverse data and study fridge magnets beyond the limited confines of student accomodation. We hypothesise that there could be a link between fridge magnet types and social class and/or educational achievement.