This page covers our testing strategies.
Software testing is an essential part of the whole product lifecycle. A variety of software testing strategies can point out errors and mistakes. Developers can use tests as a guideline to build a product according to the client’s specification. Moreover the client and the users can expect that a tested solution is reliable and are more likely to be satisfied with the end result. Users want a stable software solution, without bugs and crashes. Testing helps teams to ensure the quality of the product. However, even when using multiple testing techniques, programmers can miss certain bugs and errors, not covered by tests. For this reason it’s very important to listen to the user feedback and fix the raised issues as soon as possible. This way the users receive the software fitted to them. For these reasons we have used multiple testing strategies throughout our project.
To make sure that the data anonymiser produces the correct results, we have used unit tests. We used multiple csv files containing sample data and created the same number of anonymised data sets by hand: replacing the unwanted information with asterixis. We used these files in our unit tests. Each unit tests creates an automatically generated output file using our tool. The output is then compared with the anonymised dataset we created by hand. If the contents of the two files match, the unit test passes. More unit tests can be easily added by using new datasets and anonymising them by hand and saving the handmade results in a file.
We carried out integration testing of the anonymiser and visualiser with the core analytics system. First we deployed all parts of our system on one local machine. We then ran through all parts of the system, making sure that at every point the correct data is being processed.
We used a sample input csv file and ran it through our data anonymiser. We compared the output with the input to make sure that the anonymiser produced a good result.
Next we uploaded the output file to the core analytics system. We entered the NiFi user interface to make sure that the file was actually uploaded to the core analytics system.
NiFi sends the file to Kafka, so we checked the Kafka logs to make sure that Kafka indeed received that file.
Our visualiser tool receives the data file though Kafka and Elastic stack. To verify that the file is available in the visualiser we opened the web interface of our solution and viewed the raw incoming data. This correct data was present, so we then created graphs to make sure that the visualiser can actually generate graphs based on the incoming raw data and our selected settings.
By following these steps we have verified that all parts of the system are working together as expected to produce the correct result for the user. After making sure that the system is running locally, we deployed the system on Azure and used the same approach to test the whole system in the cloud. Just like on our local machine, the system worked as expected on Azure.
Furthermore, we have written scripts to automate deployment and have tested their functionality.
We tested the main functions of our system by using our tools and doing the tasks which the end users might do with our software. We also checked for accessibility of our solution. One of the requirements was a high visibility mode for visually impaired users as well as a night mode. We also ran our anonymiser on multiple machines and operating systems to ensure that the majority of the end users will be able to use our tools. Last, we ensured that our system produces correct error messages for the user in case exceptions happen. For example when our visualiser has no connection to the core analytics an error message is displayed.
We used system testing and black box testing to verify that the two parts of our system which we have developed were working as required. We replaced the core analytics system with a “black box” to simulate how it works, because developing the core analytics system was not the main goal of our project. We verified that the anonymiser produces a correct result. We then created another file which would come out of the core analytics system and sent it to our visualiser which created graphs as expected.
Usability testing: We tested our solution on multiple devices and operating systems to make sure that the end users will be able to use our software.
We carried out user testing to ensure that the both the anonymiser and visualiser are most suitable for the target audience, and able to complete all the tasks required. During the HCI part of the project, we gained feedback on early design prototypes of our tool using online questionnaires. We improved the design multiple times using their feedback and made the tool most suitable for the target audience.
Throughout the development of the data anonymisation tool we asked for user feedback and improved the GUI to best match the needs of the end users. At one major milestone of our project we sent our anonymiser to one of the potential users, who used our tool with real medical data. His feedback was very valuable to further improve the design.
For example the user wanted to see the column settings on the main screen, instead of the popup where the column settings can be changed. For this reason we added two labels to the main screen: displaying the anonymisation type (as is, anonymise, remove) and the column type (name, age, postcode, etc).
Next, the user commented that he is working with huge csv files with 50+ columns and our tool would benefit massively from a multiselect option to bulk assign column types and anonymisation options. To add this functionality, we added checkboxes next to the column settings and an options button which allows to change settings for the selected columns.
Stress testing is a type of non-functional testings where we tried to test the whole system beyond normal capacity. For this we used a very large csv file to run through our anonymiser. After deploying our system on Azure we also sent a large number of automated requests to the core analytics system. We monitored the performance of the system in the Azure online web interface.
Our system seemed to remain fast and reliable, even under these conditions. Although our systems worked correctly under our load, more stress testing needs to be done before the whole system is deployed in a production environment.
Once the server was fully deployed, tested for performance and system accuracy, we carried out security testing. Firstly, we began with port scanning of the virtual machine to check that only the necessary ports are opened to the public. By ensuring that only the required services are publicly available, attack vectors are reduced the the server is less likely to be compromised.