Team 4 - PEACH Data Anonymisation and Analytics Visualisation

Testing Strategy

Software testing is an essential part of the whole product lifecycle. A variety of software testing strategies can point out errors and mistakes. Developers can use tests as a guideline to build a product according to the client’s specification.
Moreover the client and the users can expect that a tested solution is reliable and are more likely to be satisfied with the end result. Users want a stable software solution, without bugs and crashes.
Testing helps teams to ensure the quality of the product.
However, even when using multiple testing techniques, programmers can miss certain bugs and errors, not covered by tests. For this reason it’s very important to listen to the user feedback and fix the raised issues as soon as possible. This way the users receive the software fitted to them.
For these reasons we have used multiple testing strategies throughout our project.

Unit Testing

To make sure that the data anonymiser produces the correct results, we have used unit tests. We used multiple csv files containing sample data and created the same number of anonymised data sets by hand: replacing the unwanted information with asterixis.
We used these files in our unit tests. Each unit tests creates an automatically generated output file using our tool. The output is then compared with the anonymised dataset we created by hand. If the contents of the two files match, the unit test passes.
More unit tests can be easily added by using new datasets and anonymising them by hand and saving the handmade results in a file.

Integration Testing

We carried out integration testing of the anonymiser and visualiser with the core analytics system. First we deployed all parts of our system on one local machine. We then ran through all parts of the system, making sure that at every point the correct data is being processed.

We used a sample input csv file and ran it through our data anonymiser. We compared the output with the input to make sure that the anonymiser produced a good result.

Next we uploaded the output file to the core analytics system. We entered the NiFi user interface to make sure that the file was actually uploaded to the core analytics system.

NiFi sends the file to Kafka, so we checked the Kafka logs to make sure that Kafka indeed received that file.

Our visualiser tool receives the data file though Kafka and Elastic stack. To verify that the file is available in the visualiser we opened the web interface of our solution and viewed the raw incoming data. This correct data was present, so we then created graphs to make sure that the visualiser can actually generate graphs based on the incoming raw data and our selected settings.

By following these steps we have verified that all parts of the system are working together as expected to produce the correct result for the user.
After making sure that the system is running locally, we deployed the system on Azure and used the same approach to test the whole system in the cloud. Just like on our local machine, the system worked as expected on Azure.

Furthermore, we have written scripts to automate deployment and have tested their functionality.

Functional Testing

We tested the main functions of our system by using our tools and doing the tasks which the end users might do with our software.
We also checked for accessibility of our solution. One of the requirements was a high visibility mode for visually impaired users as well as a night mode. We also ran our anonymiser on multiple machines and operating systems to ensure that the majority of the end users will be able to use our tools.
Last, we ensured that our system produces correct error messages for the user in case exceptions happen. For example when our visualiser has no connection to the core analytics an error message is displayed.

System testing

We used system testing and black box testing to verify that the two parts of our system which we have developed were working as required. We replaced the core analytics system with a “black box” to simulate how it works, because developing the core analytics system was not the main goal of our project. We verified that the anonymiser produces a correct result. We then created another file which would come out of the core analytics system and sent it to our visualiser which created graphs as expected.

Usability testing: We tested our solution on multiple devices and operating systems to make sure that the end users will be able to use our software.

User testing

We carried out user testing to ensure that the both the anonymiser and visualiser are most suitable for the target audience, and able to complete all the tasks required. During the HCI part of the project, we gained feedback on early design prototypes of our tool using online questionnaires. We improved the design multiple times using their feedback and made the tool most suitable for the target audience.

Throughout the development of the data anonymisation tool we asked for user feedback and improved the GUI to best match the needs of the end users. At one major milestone of our project we sent our anonymiser to one of the potential users, who used our tool with real medical data. His feedback was very valuable to further improve the design.

For example the user wanted to see the column settings on the main screen, instead of the popup where the column settings can be changed. For this reason we added two labels to the main screen: displaying the anonymisation type (as is, anonymise, remove) and the column type (name, age, postcode, etc).

Next, the user commented that he is working with huge csv files with 50+ columns and our tool would benefit massively from a multiselect option to bulk assign column types and anonymisation options. To add this functionality, we added checkboxes next to the column settings and an options button which allows to change settings for the selected columns.

Stress Testing

Stress testing is a type of non-functional testings where we tried to test the whole system beyond normal capacity. For this we used a very large csv file to run through our anonymiser.
After deploying our system on Azure we also sent a large number of automated requests to the core analytics system. We monitored the performance of the system in the Azure online web interface.

Our system seemed to remain fast and reliable, even under these conditions.
Although our systems worked correctly under our load, more stress testing needs to be done before the whole system is deployed in a production environment.

Security Testing

Once the server was fully deployed, tested for performance and system accuracy, we carried out security testing. Firstly, we began with port scanning of the virtual machine to check that only the necessary ports are opened to the public. By ensuring that only the required services are publicly available, attack vectors are reduced the the server is less likely to be compromised.

Starting Nmap 7.01 ( https://nmap.org ) at 2018-03-22 13:54 GMT Nmap scan report for peach-analytics.uksouth.cloudapp.azure.com (51.143.158.41) Host is up (0.0079s latency). Not shown: 65518 filtered ports PORT STATE SERVICE VERSION 22/tcp open ssh OpenSSH 7.2p2 Ubuntu 4ubuntu2.4 (Ubuntu Linux; protocol 2.0) | ssh-hostkey: | 2048 5b:53:9f:45:bf:37:be:da:37:52:5a:52:a3:ea:55:88 (RSA) |_ 256 f0:47:e1:c6:9e:ed:57:1e:ec:ce:65:b6:3f:0e:fb:a9 (ECDSA) 25/tcp closed smtp 80/tcp open http PEACH |_http-server-header: PEACH |_http-title: PEACH Core Analytics 135/tcp closed msrpc 136/tcp closed profile 137/tcp closed netbios-ns 138/tcp closed netbios-dgm 139/tcp closed netbios-ssn 161/tcp closed snmp 445/tcp closed microsoft-ds 1025/tcp closed NFS-or-IIS 1026/tcp closed LSA-or-nterm 1433/tcp closed ms-sql-s 1434/tcp closed ms-sql-m 2869/tcp closed icslap 5000/tcp open http Werkzeug httpd 0.14.1 (Python 3.5.2) |_http-server-header: Werkzeug/0.14.1 Python/3.5.2 |_http-title: Did not follow redirect to http://peach-analytics.uksouth.cloudapp.azure.com/ 5601/tcp open unknown 2 services unrecognized despite returning data. If you know the service/version, please submit the following fingerprints at https://nmap.org/cgi-bin/submit.cgi?new-service : ==============NEXT SERVICE FINGERPRINT (SUBMIT INDIVIDUALLY)============== SF-Port80-TCP:V=7.01%I=7%D=3/22%Time=5AB3B61F%P=x86_64-pc-linux-gnu%r(GetR SF:equest,F26,"HTTP/1\.0\x20200\x20OK\r\nServer:\x20PEACH\x20\r\nDate:\x20 SF:Thu,\x2022\x20Mar\x202018\x2013:56:47\x20GMT\r\nContent-type:\x20text/h SF:tml\r\nContent-Length:\x203713\r\nLast-Modified:\x20Mon,\x2019\x20Mar\x SF:202018\x2011:21:04\x20GMT\r\n\r\n<!DOCTYPE\x20html>\n<head>\n\x20\x20\x SF:20\x20<title>PEACH\x20Core\x20Analytics</title>\n\x20\x20\x20\x20<meta\ SF:x20charset=\"utf-8\">\n\x20\x20\x20\x20<link\x20rel=\"stylesheet\"\x20h SF:ref=\"https://maxcdn\.bootstrapcdn\.com/bootstrap/4\.0\.0/css/bootstrap SF:\.min\.css\"\x20integrity=\"sha384-Gn5384xqQ1aoWXA\+058RXPxPg6fy4IWvTNh SF:0E263XmFcJlSAwiGgFAW/dAiS6JXm\"\x20crossorigin=\"anonymous\">\n\x20\x20 SF:\x20\x20<style>\n/\*\n\x20\*\x20Globals\n\x20\*/\n\n/\*\x20Links\x20\*/ SF:\na,\na:focus,\na:hover\x20{\n\x20\x20color:\x20#fff;\n}\n\n/\*\x20Cust SF:om\x20default\x20button\x20\*/\n\.btn-secondary,\n\.btn-secondary:hover SF:,\n\.btn-secondary:focus\x20{\n\x20\x20color:\x20#333;\n\x20\x20text-sh SF:adow:\x20none;\x20/\*\x20Prevent\x20inheritance\x20from\x20`body`\x20\* SF:/\n\x20\x20background-color:\x20#fff;\n\x20\x20border:\x20\.05rem\x20so SF:lid\x20#fff;\n}\n\n\n/\*\n\x20\*\x20Base\x20structure\n\x20\*/\n\nhtml, SF:\nbody\x20{\n\x20\x20height:\x20100%;\n\x20\x20background-color:\x20#33 SF:3;\n}\nbody\x20{\n\x20\x20color:\x20#fff;\n\x20\x20text-align:\x20cente SF:r;\n")%r(HTTPOptions,E7,"HTTP/1\.0\x20501\x20Unsupported\x20method\x20\ SF:('OPTIONS'\)\r\nServer:\x20PEACH\x20\r\nDate:\x20Thu,\x2022\x20Mar\x202 SF:018\x2013:56:47\x20GMT\r\nConnection:\x20close\r\nContent-Type:\x20text SF:/html\r\n\r\n<head><title>Error</title></head><body><h1>Error</h1><p>An SF:\x20error\x20happened\x20:\(</p></body>")%r(RTSPRequest,57,"<head><titl SF:e>Error</title></head><body><h1>Error</h1><p>An\x20error\x20happened\x2 SF:0:\(</p></body>")%r(FourOhFourRequest,D7,"HTTP/1\.0\x20404\x20File\x20n SF:ot\x20found\r\nServer:\x20PEACH\x20\r\nDate:\x20Thu,\x2022\x20Mar\x2020 SF:18\x2013:56:52\x20GMT\r\nConnection:\x20close\r\nContent-Type:\x20text/ SF:html\r\n\r\n<head><title>Error</title></head><body><h1>Error</h1><p>An\ SF:x20error\x20happened\x20:\(</p></body>"); ==============NEXT SERVICE FINGERPRINT (SUBMIT INDIVIDUALLY)============== SF-Port5601-TCP:V=7.01%I=7%D=3/22%Time=5AB3B624%P=x86_64-pc-linux-gnu%r(Ge SF:tRequest,EC,"HTTP/1\.1\x20302\x20Found\r\nlocation:\x20/login\?next=%2F SF:\r\nkbn-name:\x20kibana\r\nkbn-version:\x206\.1\.2\r\nkbn-xpack-sig:\x2 SF:0c0cb482d750d73586429d87e637b0615\r\ncache-control:\x20no-cache\r\ncont SF:ent-length:\x200\r\nDate:\x20Thu,\x2022\x20Mar\x202018\x2013:56:52\x20G SF:MT\r\nConnection:\x20close\r\n\r\n")%r(HTTPOptions,12B,"HTTP/1\.1\x2040 SF:4\x20Not\x20Found\r\nkbn-name:\x20kibana\r\nkbn-version:\x206\.1\.2\r\n SF:kbn-xpack-sig:\x20c0cb482d750d73586429d87e637b0615\r\ncontent-type:\x20 SF:application/json;\x20charset=utf-8\r\ncache-control:\x20no-cache\r\ncon SF:tent-length:\x2038\r\nDate:\x20Thu,\x2022\x20Mar\x202018\x2013:56:52\x2 SF:0GMT\r\nConnection:\x20close\r\n\r\n{\"statusCode\":404,\"error\":\"Not SF:\x20Found\"}")%r(FourOhFourRequest,119,"HTTP/1\.1\x20302\x20Found\r\nlo SF:cation:\x20/login\?next=%2Fnice%2520ports%252C%2FTri%256Eity\.txt%252eb SF:ak\r\nkbn-name:\x20kibana\r\nkbn-version:\x206\.1\.2\r\nkbn-xpack-sig:\ SF:x20c0cb482d750d73586429d87e637b0615\r\ncache-control:\x20no-cache\r\nco SF:ntent-length:\x200\r\nDate:\x20Thu,\x2022\x20Mar\x202018\x2013:57:45\x2 SF:0GMT\r\nConnection:\x20close\r\n\r\n"); Service Info: OS: Linux; CPE: cpe:/o:linux:linux_kernel Service detection performed. Please report any incorrect results at https://nmap.org/submit/ . Nmap done: 1 IP address (1 host up) scanned in 265.92 seconds

Next, we considered and carried out security testing of the open ports.

Firstly, with the ssh service, the latest version of the service on the package manager was being used (OpenSSH 7.2p2 Ubuntu 4ubuntu2.4 (Ubuntu Linux; protocol 2.0)) and by using ssh keys, brute force attempts are not feasibly possible. As a result, no additional testing was carried out on the ssh service.

With the HTTP service running on port 80 and the server being identified as PEACH, we ran a web penetration tool on this service and the results are below. Security Assessment on Port 80 With a medium risk level, the security vulnerabilities mainly related to this service are with the lack of TLS and some HTTP headers. As a result, it was decided that for the purposes of this protype, these vulnerabilities did not pose a big threat to the security of the system.

Next, with the another HTTP service, the Python Flask server running on port 5000. Security Assessment on Port 5000 The vulnerabilities mainly relate to the lack of HTTPS, missing HTTP headers and software information leakage. Again, it was decided that for the purposes of this protype, these vulnerabilities did not pose a big threat to the security of the system.

Finally, we scanned the Kibana endpoint, which is avaiable on port 5601. Security Assessment on Port 5601 With the lack of HTTPS, missing HTTP headers and software leakage, it was noted that for this project prototype, these security vulnerabilities would not impact the system's use.