Validation of MS Excel Spreadsheets

This article will present a simple method for validating MS Excel spreadsheets for GXP use. The goal of our validation strategy is to provide better testing and documentation of individual spreadsheets in less time. Once we establish this methodology, we can rapidly apply it to many spreadsheets to reach compliance faster.

Scope

This article will attempt to cover the most common examples of spreadsheet validation activities that we find. We will attempt to cover special or very complicated examples whenever possible, but it should be understood that anything not explicitly covered in this article can be added as needed to fit your existing validation requirements or for special cases. A good example is any spreadsheet that uses or is used with custom automation. We have validated spreadsheets that contained hundreds of formulas and custom macros, but did not require any user intervention at all because they were created, populated with data and saved entirely by external code. In this case, the focus of the validation was on validating formulas and macros, but security testing was limited to proving that users could not intercept or interrupt the operation at any time.

Assumptions

We are not going to make any assumptions about the existing SDLC (Software Development Life Cycle) that may be currently in use, any material presented here can be adapted to meet your existing standards and practices.
We are also going to assume that the spreadsheet that need validation has already been written, as this is what we find in the overwhelming majority of the cases presented to us.

Approach

The basics of our approach are as follows:

  • We define our basic documentation practices and methodology in a single document, the Spreadsheet Validation Master Plan. This document can be referred to for all individual spreadsheet validation projects, and will not need to be reviewed or approved for each validation effort.
  • We define the requirements for each sheet or chart in the workbook, then focus our testing on verifying these requirements.
  • A heavy emphasis is placed on defining and testing formulas, and also on the security for each sheet by limiting the parts of each sheet that the users are allowed to edit.

Methodology

To keep things simple, we define how we enter information into the requirements and design specification documents, then describe how we go about testing the spreadsheet in the test protocols. Keep in mind that input cells may be defined as one single cell or as a range of cells.

Requirements Specification

The requirements specification should include all the requirements that your spreadsheet must accomplish. This document should be kept simple and relatively non-technical so that anyone who reads the document will understand that the requirements are.

Starting with a common validation template:

  • List all sheets and charts in the workbook
  • What users should be allowed to enter into the sheet. These are your input cells. You may find it helpful here and for later use to label each input cell, or they can be labeled in the design specification document.
  • What formulas or calculations exist on the sheet.
  • What cells represent the output or final calculation of the sheet. This may also refer to a chart.
  • For each chart, define properties like the title, axis labels and units, and the datasets used to create the chart.

Design Specification

Now that you have defined your requirements for each sheet, you can create the detailed software design specification document. The purpose of this document is to describe how the requirements have been implemented. This document should include enough information so that a developer could create the entire software project based on the information contained within this document and from reading the requirements specification
We like to break each sheet into four sections: Inputs, Processing, Outputs and Security.

Inputs

Document the cells users are expected to enter or update data. In an automated system, you can also define the source or the input data or instructions. If any validation rules are used to enforce proper data entry, these should be documented here as well.

Processing

Processing is mostly about documenting the formulas that are used on the sheet. Any custom macros or code is also documented here.
The majority of errors that we find when validating sheets is in the formulas. The best way we have found to catch these errors is to define each formula using the actual names of the variables represented as input cells.

For example, it may be easy to document that the range of cells F10:F20 contains the formula β€œ=(A10*$C$5)/($D$5*B10). However, it is difficult to verify if this formula is correct. We recommend writing the formulas out like this:

  • Cell $C$5: Volume (V)
  • Cell $D$5: Ideal Gas Constant (R)
  • Cells A10:A20 Pressure (P)
  • Cells B10:B20 Temperature (T)
  • F10:F20 Final Result (moles of gas, n)
  • n=(PV)/(RT)

This is the easiest and most effective way we have found to verify that formulas are correct / to catch errors in formulas.

The other kind of processing is the validation of macros and code that is used in the spreadsheet. Here, the design specification is a good place to copy the code and annotate as needed to describe the purpose of each macro or function.

Note: If you have good coding standards and user proper headers and comments in your code, this step may be already done for you.

Outputs

Outputs usually fall under one of the following three main categories:

1. The cell or range of cells that contain the final result of all previous calculations
2. Charts – many times these are printed and saved with external reports
3. Data that is copied into a final result sheet or exported to a separate file or database.

Security

This section can be a short statement, i.e. β€œAll non-input cells should be locked to prevent changes.” You can also include additional security settings if you are using either custom code or a third party add-on to implement multiple levels of security to control who can edit certain cells, run a macro or function, etc.

Test Protocols

The testing of any spreadsheet should prove that the requirements were properly implemented according the design specifications.

Installation Qualification (IQ) testing is usually limited to making sure the file is in a location where users can access the file, unless the workbook is part of a larger automation project. Operational Qualification (OQ) testing is mostly about verifying formulas, macros, and also to test the security of each sheet to verify that all non-input cells are locked to prevent changes. The IQ and OQ can be combined into a single IOQ protocol as needed.

To start generating our test cases, Once again, we like to break testing down into Inputs, Processing, and Outputs, but security testing is easier to test in separate test cases.

Process & Output Testing

These are usually easier to combine when writing test cases. Once again, FOCUS ON TESTING THE FORMULAS!

  • Are formulas correct? There are several methods for verifying and testing formulas
  • Continue testing until all input cells have had data entered on changed, and that result has been verified. The key point is to look for formulas that are incorrect or charts that are not using the correct or complete set of data, or are pointing to the wrong columns of data.
  • Macros can be tested by entering a range of data and comparing the results with a hand calculator, or by visual inspection that the function performed as expected.
  • Charts can be tested by a combination of visual inspection or verification of the properties, including the dataset used as the basis for the charts.

Security Testing

The type and amount of security testing that you can do for spreadsheets is largely based on how security has been implemented. At a bare minimum, you should test that users are limited to entering data into the defined input cells only, and that they do not have the ability to alter any other part of the spreadsheet. Failure to do this can compromise the integrity of the validation effort and of any data or information generated by the workbook.

Conclusion

This methodology will result in a User/Functional Requirements Specification, a Software Design Specification, and an IOQ Protocol ready for approval and execution. Any deviations found during testing can be handled according to existing validation practices, and a summary report that shows that all the activities specified in the Validation Master Plan or SOP can be generated.

Author

Tyson Mew

President Ofni Systems