1.1
This practice provides statistical methodology for conducting equivalence testing on numerical data from two sources to determine if their true means or variances differ by no more than predetermined limits.
1.2
Applications include
(1)
equivalence testing for bias against an accepted reference value,
(2)
determining means equivalence of two test methods, test apparatus, instruments, reagent sources, or operators within a laboratory or equivalence of two laboratories in a method transfer, and
(3)
determining non-inferiority of a modified test procedure versus a current test procedure with respect to a performance characteristic.
1.3
The guidance in this standard applies to experiments conducted on a single material at a given level of the test result or on multiple materials covering a range of selected test results.
1.4
Guidance is given for determining the amount of data required for an equivalence trial. The control of risks associated with the equivalence decision is discussed.
1.5
The values stated in SI units are to be regarded as standard. No other units of measurement are included in this standard.
1.6
This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of regulatory limitations prior to use.
1.7
This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
====== Significance And Use ======
4.1
Laboratories conducting routine testing have a continuing need to make improvements in their testing processes. In these situations it must be demonstrated that any changes will neither cause an undesirable shift in the test results from the current testing process nor substantially affect a
performance characteristic
of the test method. This standard provides guidance on experiments and statistical methods needed to demonstrate that the test results from a modified testing process are equivalent to those from the current testing process, where
equivalence
is defined as agreement within a prescribed limit, termed an
equivalence limit
.
4.1.1
The equivalence limit, which represents a worst-case difference or ratio, is determined prior to the equivalence test and its value is usually set by consensus among subject-matter experts.
4.1.2
Examples of modifications to the testing process include, but are not limited, to the following:
(1)
Changes to operating levels in the steps of the test method procedure,
(2)
Installation of new instruments, apparatus, or sources of reagents and test materials,
(3)
Evaluation of new personnel performing the testing, and
(4)
Transfer of testing to a new location.
4.1.3
Examples of performance characteristics directly applicable to the test method include bias, precision, sensitivity, specificity, linearity, and range. Additional characteristics are test cost and elapsed time needed to conduct the test procedure.
4.2
Equivalence testing is performed by a designed experiment that generates test results from the modified and current testing procedures on the same types of materials that are routinely tested. The design of the experiment depends on the type of equivalence needed as discussed below. Experiment design and execution for various objectives is discussed in Section
5
.
4.2.1
Means equivalence
is concerned with a potential shift in the mean test result in either direction due to a modification in the testing process. Test results are generated under repeatability conditions by the modified and current testing processes on the same material, and the difference in their mean test results is evaluated.
4.2.1.1
In situations where testing cannot be conducted under repeatability conditions, such as using in-line instrumentation, test results may be generated in pairs of test results from the modified and current testing processes, and the mean differences among paired test results are evaluated.
4.2.2
Range equivalence
evaluates the differences in means over a selected wider range of test results and the experiment uses materials that cover that range. The slope of the linear statistical relationship between the test results from the two testing procedures is calculated. If the slope is equivalent to the value one (1), then the two testing processes meet slope equivalence. The combination of slope equivalence and means equivalence defines range equivalence.
4.2.3
Bias equivalence
is a special case of means equivalence applied to a performance characteristic. A single set of test results is generated on a certified reference material (CRM) having an accepted reference value (ARV) to evaluate the test method bias of the current testing procedure. The mean test result is then compared with the ARV to estimate the occurrence of a known bias.
4.2.4
Non-inferiority
is concerned with a difference only in the direction of an inferior outcome in a performance characteristic of the modified testing procedure versus the current testing procedure. Non-inferiority may involve the comparisons of means, standard deviations, or other statistical parameters.
4.2.4.1
Non-inferiority testing may involve trade-offs in performance characteristics between the modified and current procedures. For example, the modified process may be slightly inferior to the established process with respect to assay sensitivity or precision but may have off-setting advantages such as faster delivery of test results or lower testing costs.
4.3
Risk Management—
Guidance is provided for determining the amount of data required to control the risks of making the wrong decision in accepting or rejecting equivalence (see
5.4
and Section
X1.2
).
4.3.1
The
consumer’s risk
is the risk of falsely declaring equivalence. The probability associated with this risk is directly controlled to a low level so that accepting equivalence gives a high degree of assurance that the true difference is less than the equivalence limit.
4.3.2
The
producer’s risk
is the risk of falsely rejecting equivalence. The probability associated with this risk is controlled by the amount of data generated by the experiment. If valid improvements are rejected by equivalence testing, this can lead to opportunity losses to the company and its laboratories (the producers) or cause unnecessary additional effort in improving the testing process.