PLEASE DELETE THIS WHOLE CELL BEFORE SUBMITTING YOUR PROJECT
The purpose of this assignment is to develop your skills in organising and presenting a Data Science project.
Since, most of the marks will be awarded for organisation and presentation, it is suggested that you do not initially attempt anything too complicated. However, once you have managed to get a basic pipeline working that fits the guidelines, you are encouraged to extend and elaborate your analysis.
Your project should entirely contained within this template file. You should keep the basic structure indicated below. To facilitate grading according to the marking scheme.
You may import any module that is provided with Anaconda3 Python.
The marking scheme is as follows:
Project Plan:
Program Code: (15)
Code should be laid out in steps with explanations
and intemediate output with comments.
You should ensure that the steps do not require
a large amout of processing time.
Project Outcome:
You can use any data you like. Many useful resources are available. As a starting point you could look at the following:
Please use this notebook as a template for your project file. In the following cells of the notebook, italic text giving explanations and examples should be either deleted, or, in most cases, replaced by appropriate text describing your project. Text that is not in italic (which is mostly headings) should be left as it is. Your project report notebook should the same overall structure as this templete notebook. An exception to this is the current markup cell describing the project requiements. You should delete this before submitting your notebook.
Here you should describe the data indluding details of: where it comes from, what data it contains, is it accurate. (Potentially you could create data from a simulation but you should still explain why and how you indend to generate this data.)
It can be just one dataset or several that can be combined somehow.
(Potentially you could create your own data data from a simulation algorithm but you should still explain why and how you indend to generate this data.)
There are 10 marks for this, so a fairly detailed description of the data is expected (around 300-400 words)
Here you should describe the general aim of your project in around 200-300 words.
This can can be anything from classifying items according to their characteristic features (which mushrooms are poisonous?) to simulating an evolving process (will the rabbits eat all the carrots or get eaten by the foxes?)
Here some ideas of general types of processing functionality that you could implement:
You should chose and list up to 3 specific objectives suited to the data you will be working with and the type of project you wish to carry out. There should be at least one per person doing the project. There is no need for the objectives them to be completely different. They could be different stages of the processing requirements, or different processing functions that the system provides. Or just different aspects of data analysis that will be conducted. Typically, it is expected that there would be one objective per person, but you may do more. Replace the following examples with your own objectives:
Describe your code in terms of the following two sections.
Typically this would be a pipeline in which data goes through several stages of transformation and analysis, but other architectures are possible. This does not need to be particularly complicated. A simple diagram with 100-150 words of explanation would be a good way to present your architecture.
Briefly list and describe the most significant computational components of your system and the algorithms you will use to implement them. This could include things like:
Your list can be presented in similar form to the one just given, but should include a brief but more specific description of the components and/or algorithms. Probably three or four components is sufficient for most projects, but you may want to have more.
Your code should be divided into relatively short cells, with brief explanation in markup cells between.
As noted in the assigment overview, it is not necessary that your coding be super complex in order to get a good mark. Although there is a mark for the coding achievement, it is only a quarter of the total.
The suggested length of the code is about 200 lines for a 1 person project or 300 for a 2 or 3 person project. You should not use more than 500 lines of code.
You should divide the code in accordance with the specification of modules and/or algoriths you gave in the previous section. Complex modules should be further divided into several code cells.
Please note the following about your code:
Below is a silly example of some trivial data. Replace this markup cell and the one below with somthing more interesting. And go on adding more until you have achieved your objectives (at least to some extent).
## Code Cell
## This will typically consist of:
## (a) Code doing some data manipulation:
fm_data = { "souvenir" : 9,
"cute animal" : 5,
"meme" : 36,
"smiley" : 3,
"random image" : 13
}
total = sum([fm_data[f] for f in fm_data])
## (b) Code for displaying some output:
print("The total number of fridge magnets is:", total)
The total number of fridge magnets is: 66
As well as describing code, it will in many cases be informative to describe the output that has been generated by a cell.
The previous output cell shows a key number in our fridge magnet analysis.
Since fridge magnets often take the form of cute animals, we use pandas
to convert the raw data into a DataFrame
.
## Code Cell
import pandas
df = pandas.DataFrame.from_dict(fm_data, orient='index')
df
0 | |
---|---|
souvenir | 9 |
cute animal | 5 |
meme | 36 |
smiley | 3 |
random image | 13 |
The output from the previous cell is very interesting.
The following cell defines a visualisation function for the data.
def fridge_sorted_bar(color='blue'):
df.sort_values(0).plot.bar( color=color)
You can add as many code cells as you require, but it is recommended that you break code into relatively small chunks and do not exceed the maximum number of lines stated above.
This section should describe the outcome of the project by means of both explanation of the results and by graphical visualisation in the form of graphs, charts or or other kinds of diagram
The section should begin with a general overview of the results and then have a section for each of the project objectives. For each of these objectives an explanation of more specific results relating to that objective shoud be given, followed by a section presenting some visualisation of the results obtained. (In the case where the project had just one objective, you should still have a section describing the results from a general perspective followed by a section that focuses on the particular objective.)
The marks for this section will be divided into 10 marks for Explanation and 10 marks for Visualisation. These marks will be awarded for the Project Outcome section as a whole, not for each objective individually. Hence, you do not have to pay equal attention to each. However, you are expected to have a some explanation and visualisation for each. It is suggested you have 200-400 words explanation for each objective.
Give a general overview of the results (around 200 words).
fridge_sorted_bar(color='red')
_Your concluding section should be around 200-400 words. It is recommended that you divide it into the following sections.__
As we had expected, the most popular fridge magnets were of the 'meme' kind. We were surprised that 'smiley' fridge magnets were less common than expected. We conjecture that this is because, although they are apparently very popular, few fridges display more than one smiley. However, 'meme' based magnets can be found in large numbers, even on quite small fridges.
The project was limited to a small number of fridge magents, which may not be typical of fridges found in the global fridge magnet ecosystem.
In future work we would like to obtain more diverse data and study fridge magnets beyond the limited confines of student accomodation. We hypothesise that there could be a link between fridge magnet types and social class and/or educational achievement.