P4DS Assignment 1, Autumn 2021

Python Programming

Brandon Bennett

Last modified: 3rd October 2021 (BB)

This coursework is intended to develop and test skills in defining basic algorithmic functions in Python, with an emphasis on types of function for processing, transforming and classifying. Completing this set of tasks should give you a very good grounding in the fundamental programming techiniques required for DataScience.

In this exercise we are only using a limited set of library modules. This is to ensure you master the kinds of algorithms required rather than relying on pre-programmed functions provided by Python packages.

Do not import any module other than those already imported into this file.

In other words do not add import statements in any code you write for this assignment. And, of course, do not add extra packages to the import statements that are already in the code.

Form of the Assignment

Your work will consist of defining a number of functions according to given specifications.

You will be given a bare skeleton function just giving the function name, arguments and a dummy return value.

You need to edit the functions so that for any input of the expected type, the result returned will fit the given specification.

For the grading I will not try to catch you out by giving inputs of the wrong type (such as a number instead of a string) or names of files that do not exist. It is not that kind of programming course.

You will be able to check that your functions are working correctly, by running a testing function defined in the Python file A1_P4DS2_Python_Programming_tests.py. You should download this and put it in the same directory as this file (A1_P4DS2_Python_Programming.ipynb).

Note: The tests carried out by the testing module provided in this file are similar to but not the same as those that will be used for the final grading. The functions you define need to satisfy the general requirements that are specified, otherwise you may lose marks when the final grading is done using other tests.

When answering the questions below, you can just modify the given template cell. You don't need to create a new cell for your definition. But make sure you do not alter the name of the function or the number and type of its arguments, otherwise the automatic testing/grading function will not work correctly.

Please look at the prelimiary formative assignment notebook A0_Warm_Up.ipynb for further explanation and the assignment format and illustratons of the using the testing functions.

Questions overview:

Total marks: 30

Importing the Testing Module

In order to run the function tests provided for you to check you code you will need to first run the following cell, which imports the testing module for this assignment. (As mentioned above, the tests used for final grading of this assignment may be different from the specific tests carried out by this module.)

Q1. Identify Anagrams and Palindromes

This question involves determining properties of words and identifying words with those properties.

Q1 (a) Test whether two strings are anagrams

Write a Boolean valued function anagrams(s1, s2), which returns True just in case the string s1 contains the same letters as string s2 but in a different order. The function should be case insensitive --- in other words it should return the same value if any letters in either s1 or s2 are changed from upper to lower case or from lower to upper case. You may assume that the input strings contain only letters.

Examples

INPUT 1 (string) INPUT 2 (string) OUPUT (boolean)
"sidebar" "seabird" True
"cheese" "frizbee" False
"listen" "silent" True
"this" "this" False
"Tar" "Rat" True
"Tar" "TAR" False

Q1 (b) Test whether a string is a palindrome

Define a function is_palindrome(s) that it returns True if the given string is a palindrome, otherwise returns False.

More specifically, the function should return true if, the alphabetic characters in the input string form the same sequence if they are read forward as if they are read backwards. Any non-alphabet characters, such as spaces and punctuation marks should be ignored, and letter characters are considered to be the same if one is lower case and the other is upper case. Thus the string "Do geese see God?" is counted as a palindrome, so the function should return True for this string.

Examples:

Input Output
"Bob" True
"God" False
"Abba" True
"No lemon, no melon" True
"I love Python!" False

Q2. Using a Data File of English Words

For this task you will deal with your first file of data! It is an alphabetical list of a large number of English words. You should have a look at it in an editor or view it in a browser to see what the contents look like.

This file contains contains nearly all common English words:

I also provide the following code which initialises the global variable ENGLISH_WORDS to be a set of all the words in the file english_words.txt. You will need to have that file in the same folder as this notebook file.

You are recommended to use the ENGLISH_WORDS variable in the following questions, whenever you are required to perform a calculation involving all the words in english_words.txt. This is particularly important for a function that needs to check the list of words many times. Reading information from a file typically takes more computational time than other operations, so if data can be stored in memory (e.g. as the value of a Python variable) this will usually be a lot more efficient than reading it from a file each time it is needed. (Of course, when dealing with very large datasets, it may not be possible to store the whole dataset in memory.)

Q2 (a) Check whether a string is an English word

Write a function is_english_word(s), which will test if its input string s is an English word, according to the file english_words.txt. This file contains a large number of English words, including all common words and many very rare words. Proper names are not included, and all words are given in lower case, with one word on each line of the file. You need to download the file of English words. Note that it is not a program file and you do not need to edit it.

Your function is_english_word(s) should take any string as its argument and return a Boolean value --- i.e. True or False. More specifically your function should return True if any of the following conditions hold:

If none of these conditions hold, your function should return False.

Examples:

INPUT OUPUT
"python" True
"Python" True
"PYTHON" True
"pyThon" False
"splap" False

Q2 (b) Find all anagrams of a word

Write a function find_all_anagrams(string), which take a string as input and returns a list, in alphabetical order, of all words in the file english_words.txt that are anagrams of the input string.

More specifically given an input string the function should return a list [word_1, ..., word_n], of words in alphabetical order, which contains every word in the dictionary file such that the value of the function anagrams(string,word_i) is True (as specified in Q2(a) of this coursework), and contains no other words.

Examples

INPUT OUPUT
'cheese' []
'Python' ['phyton', 'typhon']
'relating' ['alerting', 'altering', 'integral', 'tanglier', 'triangle']

Note: Though the instructions for this question are quite brief, they do exactly specify the requirements for this function. Since you should have already defined the anagrams(s1,s2) function for part Q2(a), you should call this function in your definition of the find_all_anagrams(string) function. It is much better programming style to do that, rather than repeat the full code for checking anagrams within `find_all_anagrams(string).

Q2 (c) Find palindromes of length n (in english_words.txt)

Define a function find_all_palindromes(n) that returns, in alphabetical order, the list of all palindromes in english_words.txt that are n letters long.

Q2 (d) Literate Password Checker (6 marks)

Computer systems are vulnerable to hacking if they can be accessed by passwords based on English words. Now that we have access to the set of English words, we can define a tougher password checker.

An institution uses the following rules to classify the strength of passwords:

You need to code a function password_strength that will take a string argument and will return the 'strength' of that string as a password, according to the rules given above. So it should return one of the strings
"ILLEGAL", "STRONG", "WEAK" or "MEDIUM".

You may assume that the input password is consists of only ASCII characters, that is: alphabetic letters, numerical digits, special visible charactes, spaces, tabs and newlines. The special visible characters are:

~`!@#$%^&*(){}[]|\/:;";<>.?

Examples:

INPUT OUPUT
"secret" "ILLEGAL"
"my secret" "ILLEGAL"
"qwertyu" "WEAK"
"hello123" "WEAK"
"7Kings8all9Pies!" "STRONG"
"brandon123" "MEDIUM"

Q3. A Simple Holiday Recommender System

To answer this question you will write functions that operate as a simple system for recommending holiday destinations based on the amount of money you have to spend and some desired features of your holiday. These requirements will be compared to data about possible holiday destinations in order to find those that match the cost and feature requirements.

Q3 (b) Find Available Holiday Features

For this part you should write a function available_features that takes two input parameters:

  1. The maximum amount of money (represented as an integer) that someone is prepared to spend.
  2. Holiday destination datalist.
    This is a list of lists of a form such as specified in the next code cell. It contains records for a number of holiday destinations. Each record is a list which gives the destination name, the cost of the holiday and a list of attributes associated with that destination.

NOTE: HOLIDAYS_EG is just an example of the holiday data structure. Your function should work with any similar structure, which could have different destinations, costs and attribute lists (which may contain other attribute strings).

The value returned by available_features should be a list of all possible holiday features that are available from any holiday destination whose cost is less than or equal to the maximum amount specified by input parameter 1. This list should be ordered in alphabetical order.

Here are some examples shown as a table:

parmeter 1 (cost) parameter 2 (destination data) return (feature list)
100 HOLIDAYS_EG ["Beach", "Culture"]
300 HOLIDAYS_EG ["Beach", "Culture", "Hot"]

NOTE: It need not be possible to have a holiday with all the possible features within the price limit, even though each feature must be available for some destination within the price limit.

Q3 (b) Find Possible Holiday Recommendations

For this part you should write a function recommend_holidays that takes three inputs:

  1. The maximum amount of money (represented as an integer) that someone is prepared to spend.
  2. A list of attribute strings. These can be any strings but would normally correspond to attributes that have been specified in the holiday destination data parameter.
  3. Holiday destination datalist, which is of the same for as specified for part (a).

The value returned by recommend_holidays should be a list, in alphabetical order, of destinations that satisfy the requirements, i.e. cost less than or equal to the maximum cost input and have all the features indicated by the desired features, string.

Here are some examples shown as a table:

parmeter 1 (cost) parameter 2 (features) parameter 3 (destination data) return (destination list)
500 ["beach", "culture"] HOLIDAYS_EG ["Barcelona", "Whitby"]
100 ["beach"] HOLIDAYS_EG ["Scarborough", "Whitby"]
500 ["hot", "beach"] HOLIDAYS_EG ["Barcelona", "Corfu"]
350 ["hot", "hot", "hot"] HOLIDAYS_EG ["Barcelona", "Corfu", "Rome"]
400 ["culture"] HOLIDAYS_EG ["Barcelona", "Paris", "Rome", "Whitby"]
250 ["hot", "mountains", "culture"] HOLIDAYS_EG []
100 ["hot", "python", "beach"] HOLIDAYS_EG []

Note that the second argument can be any list of strings. However, if it contains any string that is not a feature of any destination in the destination data, the value returned will be the empty list, [], since no destinations will match that requirement.

Grading

The following code will run tests for all functions that count towards your grade for Coursework 1, and will calculate the final mark you will get for this piece of coursework.

To get your grade simply select the "Run All" option from the Jupyter "Cell" menu above. Test results followed by your overal grade will be shown below.
You can do this at any point to get an indication of the marks you might obtain.

NOTE: The tests provided by the tesing module are for you to check your function definitions before final submission. They are not the same as the tests that will be used to compute the grade for your assignment submission. Hence, you may not get the same mark for your submission as you get from these pre-submission tests.

Submission