MDCG 2021-21 Rev.1
Guidance on performance evaluation of SARS-CoV-2 in vitro diagnostic medical devices
Disclaimer: This document is an interactive version of the original MDCG document. We will keep it up-to-date.
This document has been endorsed by the Medical Device Coordination Group (MDCG) established by Article 103 of Regulation (EU) 2017/745. The MDCG is composed of representatives of all Member States and it is chaired by a representative of the European Commission.
MDCG 2020-10/1 Rev.1 changes
MDCG 2021-21 Revision 1 changes | |
---|---|
Tables 1 and 2 | Footnote on vaccinated individuals revised |
Tables 4 and 5 | Footnote on specimen types added |
Tables 6 and 7 | 1st column revised, title 3rd column revised |
All tables | Minor editorial clarifications |
Introduction
This guidance document concerns performance evaluation of SARS-CoV-2 in vitro diagnostic medical devices (IVDs) in the context of conformity assessment under either Directive 98/79/EC or Regulation (EU) 2017/746. It covers devices for detection or quantification of SARS-CoV-2 nucleic acid, antigens and also detection or quantification of antibodies against SARS-CoV-2. These devices are collectively referred to as SARS-CoV-2 IVDs. The guidance is addressed to all interested parties, including notably the manufacturers, as well as notified bodies and competent authorities, authorised representatives, other market operators, professional and patient associations.
The content of this guidance document is envisaged to form the basis for common specifications to be adopted according to Article 9 of Regulation (EU) 2017/746 in the coming months. The content may be adapted to take account of changing circumstances and increasing scientific and technical knowledge, as the COVID-19 pandemic continues to evolve.
The terms “IVD”, “device”, “assay” and “test” are used interchangeably in this text.
General considerations
The general principles in this section should be taken into account for the performance evaluation of SARS-CoV-2 IVDs.
The following terms are being used in this guidance document:
- diagnostic sensitivity means the ability of a device to identify the presence of a target marker associated with SARS-CoV-2;
- true positive means a specimen known to be positive for the target marker and correctly classified by the device;
- false negative means a specimen known to be positive for the target marker and misclassified by the device;
- diagnostic specificity means the ability of a device to recognise the absence of a target marker associated with SARS CoV-2;
- false positive means a specimen known to be negative for the target marker and misclassified by the device;
- true negative means a specimen known to be negative for the target marker and correctly classified by the device;
- the limit of detection (LOD) means the smallest amount of the target marker that can be precisely detected, the LOD is part of analytical sensitivity of the device;
- analytical specificity means the ability of the method to determine solely the target marker;
- nucleic acid amplification techniques (NAT) – methods of detection and/or quantification of nucleic acids by either amplification of a target sequence, by amplification of a signal or by hybridisation;
- rapid tests means qualitative or semi-quantitative in vitro diagnostic medical devices, used singly or in a small series, which involve non-automated procedures and have been designed to give a fast result;
- robustness of an analytical procedure means the capacity of an analytical procedure to remain unaffected by small but deliberate variations in method parameters and provides an indication of its reliability during normal usage;
- cross-reactivity (or cross-reaction) means the ability of non-target analytes or markers to cause false-positive results in an assay because of similarity, e.g. the ability of non-specific antibodies binding to a test antigen of an antibody assay, or the ability of non-target nucleic acids to be reactive in a NAT assay;
- interference means the ability of unrelated substances to affect the results in an assay;
- whole system failure rate means the frequency of failures when the entire process is performed as prescribed by the manufacturer;
- first line assay means a device used to detect a marker or analyte, and which may be followed by a confirmatory assay. Devices intended solely to be used to monitor a previously determined marker or analyte are not considered first line assays;
- confirmatory assay means a device used for the confirmation of a reactive result from a first line assay;
- supplemental assay means a device that is used to provide further information for the interpretation of the test result of another assay;
- virus typing assay means a device used for typing with already known positive samples, not used for primary diagnosis of infection or for screening;
- 95% positive cut-off value for NAT assays means the analyte concentration where 95% of test runs give positive results following serial dilutions of an international reference material, where available, e.g. a World Health Organisation (WHO) International Standard or reference material calibrated against the WHO International Standard; this value describes the limit of detection (LOD) for NAT devices.
Overall considerations
Performance evaluations of SARS-CoV-2 IVDs should be carried out in direct comparison with a state-of-the-art device. The device used for comparison should be one bearing CE marking, if on the market at the time of the performance evaluation. For anti-SARS-CoV-2 tests, the new device should have an overall performance at least equivalent to that of the state of the art device of the same type, e.g. considering claims based on target antigens used and immunoglobulin classes detected.
Devices used for determination of status of samples used in performance evaluations of SARS-CoV-2 IVDs should be state-of-the-art devices bearing CE marking.
Performance evaluations of SARS-CoV-2 IVDs should be performed on a population equivalent to the European population.
If discrepant results are identified as part of a performance evaluation, these results should be resolved as far as possible, by one or more of the following: evaluation of the discrepant sample in further devices; use of an alternative method or marker; a review of the clinical status and diagnosis of the patient; testing of follow-up samples.
As part of the required risk analysis the whole system failure rate leading to false-negative results should be determined in repeat assays on low-positive specimens.
Sensitivity and specificity
Positive specimens used in the performance evaluation should be selected to reflect different stages of the respective disease(s), different antibody patterns, different genotypes, different subtypes, mutants, etc.
For SARS-CoV-2 IVDs intended by the manufacturer to be used with serum or plasma, positive specimens should include 25 positive ‘same day’ fresh serum samples (≤ 1 day after sampling).
Seroconversion panels should start with a negative bleed(s) and should reflect narrow bleeding intervals as far as possible. Where this is not possible, manufacturers should provide a justification in the performance evaluation report.
Negative specimens used in a performance evaluation should be defined so as to reflect the target population for which the device is intended, such as blood donors, hospitalised patients, pregnant women, etc.
Specificity should be calculated using the frequency of repeatedly reactive (i.e. false positive) results in individuals negative for the target marker.
For SARS-CoV-2 IVDs intended by the manufacturer to be used with serum and plasma, the performance evaluation should demonstrate serum to plasma equivalency. This should be demonstrated for at least 25 positive donations for sensitivity and 25 negative donations for specificity.
Anti-SARS-CoV-2 IVDs intended by the manufacturer for testing body fluids other than serum or plasma, e.g. urine, saliva, etc., should meet the same requirements for sensitivity and specificity as serum or plasma devices. The performance evaluation should test samples from the same individuals in both the devices to be approved and in a respective serum or plasma device.
In the case of IVDs for SARS-CoV-2 detection from secretions of the respiratory tract, their performance on all claimed specimen types should be compared to NAT tests on nasopharyngeal swabs.
Interference and cross-reactivity
The manufacturer should select the potential interfering substances to be evaluated taking account of the composition of the reagents and configuration of the device. The manufacturer should include specimens such as, where applicable: those representing related infections; those from multipara, i.e. women who have had more than one pregnancy, or rheumatoid factor (RF) positive patients; those containing human antibodies to components of the expression system, for example anti-E. coli, or anti-yeast.
Anticoagulants
For SARS-CoV-2 IVDs intended for use with plasma, the performance evaluation should verify the performance of the device using all anticoagulants which the manufacturer indicates for use with the device. This should be demonstrated for at least 50 plasma specimens per anticoagulant (25 positive and 25 negative).
Batch testing
For SARS-CoV-2 antigen and antibody tests, the manufacturer’s batch testing criteria should ensure that every batch consistently identifies the relevant antigens, epitopes, and antibodies and is suitable for the claimed specimen types.
Self-tests
SARS-CoV-2 IVDs for self-testing should meet the same requirements for sensitivity and specificity as respective devices for professional use. Relevant parts of the performance evaluation should be carried out (or repeated) by appropriate lay persons to validate the operation of the device and the instructions for use. The lay persons selected for the performance evaluation should be representative of the intended users groups.
Specific considerations
The following tables set out specific considerations for various types of SARS-CoV-2 IVDs.
Table 1 refers to the following first-line assays (including rapid tests) for antibodies against SARS-CoV-2 (anti-SARS-CoV-2): IgG-only, IgG combined with IgM and/or IgA, and total antibody.
Table 2 refers to assays for detection of anti-SARS-CoV-2 IgM and/or IgA (including rapid tests).
Table 3 refers to confirmatory or supplementary assays for anti-SARS-CoV-2.
Table 4 refers to antigen SARS-CoV-2 tests, including rapid antigen tests. Table 5 refers to nucleic acid amplification techniques (NAT) assays for SARS-CoV-2 RNA.
Tables 6 and 7 refer to additional requirements for SARS-CoV-2 antigen and antibody self- tests respectively. They are intended for devices which have already undergone a performance evaluation for professional use.
Table 1: First-line assays (including rapid tests) for anti-SARS-CoV-2: total antibody, IgG-only, IgG combined (1) with IgM and/or IgA
Parameter | Specimen | Anti-SARS-CoV-2 IgG, IgG combined, and total Ab | Acceptance criteria |
---|---|---|---|
Diagnostic sensitivity | Positive specimens | ≥400 including samples from early infection and post seroconversion (2) (within the first 21 days and after 21 days following the onset of symptoms); including samples from asymptomatic or subclinical and mildly symptomatic (outpatient treatment) individuals; including samples with low and high titers; including samples from vaccinated individuals if appropriate (3); consideration of genetic variants | ≥90% sensitivity (4) for samples taken >21 days after onset of symptoms (5); overall sensitivity including the early infection phase should be comparable to other CE-marked (6) tests |
Seroconversion panels | As far as available | Seroconversion sensitivity comparable to other CE-marked tests | |
Analytical sensitivity | Reference preparations | WHO International Standard (IS) for anti- SARS- CoV-2 (NIBSC code 20/136); | IS: for titre determinations / quantitative (7) result output; |
Specificity | Negative specimens (8) | ≥400 | >99% specificity (10) |
≥200 | Potential limitations for specificity should be determined | ||
≥100 in total |
Table 2: Assays (including rapid tests) for anti-SARS-CoV-2: IgM and/or IgA detection
Parameter | Specimen | Anti-SARS-CoV-2 IgM and IgA | Acceptance criteria |
---|---|---|---|
Diagnostic sensitivity | Positive specimens | ≥200 (11) | ≥80% sensitivity for samples taken during the first 21 days after symptom onset; |
Seroconversion panels | As far as available | Seroconversion sensitivity comparable to other CE- marked tests | |
Analytical sensitivity | Standards | N/A | N/A |
Specificity | Negative specimens | ≥200 | ≥98% specificity (13) |
≥100 | Potential limitations for specificity should be determined | ||
≥100 in total |
Table 3: Confirmatory or supplemental (14) assays for anti-SARS-CoV-2
Parameter | Specimen | Anti-SARS-CoV-2 | Acceptance criteria |
---|---|---|---|
Diagnostic sensitivity | Positive specimens | ≥200 including samples pre and post seroconversion (within the first 21 days and after 21 days following the onset of symptoms) | Correct determination as “positive” (or “indeterminate”) |
Seroconversion panels/ low titre panels | as far as available | ||
Analytical sensitivity | Standards | N/A | N/A |
Diagnostic specificity | Negative specimens (15) | ≥200 from non-infected / non-vaccinated population | No false-positive results; correct determination as “negative” (or “indeterminate”) |
≥200 ≥50 including samples with indeterminate or false- positive results in other anti-SARS-CoV-2 assays |
Table 4: Antigen assays (including rapid tests) for SARS-CoV-2
Parameter | Specimen | SARS-CoV-2 antigen | Acceptance criteria |
---|---|---|---|
Diagnostic sensitivity | Positive specimens | ≥100 (16) | Detection of >80% (rapid tests); |
Analytical sensitivity | Standards | As soon as available | Establishment of a limit of detection (25) |
Diagnostic specificity | Negative specimens | ≥300 | Specificity >98% (rapid tests) |
≥100 ≥50 | Potential limitations for specificity should be determined |
Table 5: NAT assays for SARS-CoV-2 RNA
Parameter | Specimen | SARS-CoV-2 RNA qualitative | SARS-CoV-2 RNA quantitative |
---|---|---|---|
Sensitivity | |||
Analytical Sensitivity: Limit of detection | WHO 1st International Standard SARS-CoV-2 RNA (NIBSC code 20/146; 7.70 Log10 IU/mL) Secondary standards calibrated against WHO IS | According to Ph. Eur. NAT validation guideline: | According to Ph. Eur. NAT validation guideline: |
Quantification limit; quantification features | WHO 1st International Standard SARS-CoV-2 RNA (NIBSC code 20/146; 7.70 Log10 IU/mL) Secondary standards calibrated against WHO IS | Dilutions (half-log10 or less) of calibrated reference preparations; determination of lower, upper quantification limit, limit of detection, precision, accuracy, “linear” measuring range, “dynamic range”. | |
Diagnostic Sensitivity: different SARS-CoV-2 RNA strains | Patient samples determined as SARS-CoV-2 RNA positive by comparator device from different regions and outbreak clusters; sequence variants Dilution series of SARS-CoV-2 positive cell cultures (isolates) may serve as potential substitutes | ≥100 (27) | |
Quantification efficiency | SARS-CoV-2 RNA positive patient samples from different regions and outbreak clusters; sequence variants with quantitative values obtained by comparator device Dilution series of SARS-CoV-2 RNA positive cell cultures may serve as potential substitutes | ≥100 | |
Inclusivity | In silico analysis (28); | Evidence of suitable assay design: | Evidence of suitable assay design: |
Specificity | |||
Diagnostic specificity | SARS-CoV-2 RNA negative human specimens | ≥500 | ≥100 |
In silico analysis (28) | Evidence of suitable assay design evidence (sequence alignments); regular check of primer/probe sequences against sequence data bank entries | Evidence of suitable assay design evidence (sequence alignments); regular check of primer/probe sequences against sequence data bank entries | |
Potential cross reaction | samples positive (various concentrations) for related human coronaviruses 229E, HKU1, OC43, NL63, MERS coronavirus; SARS CoV-1 if available; Influenza virus A, B; RSV; Legionella pneumophila; | ≥20 in total | ≥20 |
Robustness | |||
Cross contamination | At least 5 runs using alternating high positive (known to occur naturally) and negative samples | At least 5 runs using alternating high positive (known to occur naturally) and negative samples | |
Inhibition | Internal control preferably to go through the whole NAT procedure | Internal control preferably to go through the whole NAT procedure | |
Whole system failure rate leading to false-negative results: 99/100 assays positive | ≥100 samples virus-spiked with 3 × the 95 % positive cut-off concentration (3 x LOD) | ≥100 samples virus-spiked with 3 × the 95 % positive cut-off concentration (3 x LOD) |
Table 6:
Additional requirements for SARS-CoV-2 antigen Self-Tests (29)
Specimens (30) | Number of lay users | Criterion | |
---|---|---|---|
Result interpretation | Interpretation of contrived tests (31) by lay users reflecting a range of results:
|
≥100 | Reading and interpretation of the contrived test results by 100 lay people; each lay person should be subjected to read the specified range of result reactivity levels; determination of concordance of lay reading of the same tests by professional readers |
Diagnostic sensitivity | Lay users that are known antigen positive (33, 34) | ≥30 | In comparison to the true infectious status, i.e. by RT-PCR; concordance of results with the professional test |
Diagnostic specificity | Lay users that do not know their status (33) | ≥60 | Concordance of results with the professional test |
Table 7: Additional requirements for SARS-CoV-2 antibody Self-Tests (35)
Kopfzeile Spalte 1 | Specimens (36) | Kopfzeile Spalte 3 | Kopfzeile Spalte 3 |
---|---|---|---|
Result interpretation | Interpretation of contrived tests (37) by lay users reflecting a range of results:
| ≥100 | Reading and interpretation of the contrived test results by 100 lay people; each lay person should be subjected to read the specified range of result reactivity levels; |
Diagnostic sensitivity | Lay users that are known antibody positive (39) | ≥100 | With previous history of initial PCR confirmed infection for SARS-CoV-2; |
Diagnostic specificity | Lay users that do not know their status (39) | ≥100 | Concordance of results with the professional test |
Footnotes
(1): Performance claim of the combined overall result; separate claims for IgM and/or IgA see table 2.
(2): Details on the time interval between sampling and onset of symptoms (or time of infection, if available) should be provided.
(3): The manufacturer should provide a justification of the suitability and timing for sensitivity evaluation of the relevant antibodies in vaccinated individuals.
(4): Based on confirmed positive SARS-CoV-2-NAT result.
(5): Because sensitivity may vary or decrease over time, claims for sensitivity shall be specified in relation to the time between sampling after symptom onset or on the initial PCR diagnosis and the test.
(6): CE-marked under Regulation (EU) 2017/746 as class D. During the transition phase, reference is made to EU and ECDC SOTA guidance and current scientific literature.
(7): Quantitative assays if they are also first-line assays.
(8): Negative specimens should be from individuals with no history of SARS-CoV-2 infection (if available pre-pandemic).
(9): Individuals vaccinated with an antigen different from that used in the respective test may be included, if appropriate.
(10): False-positive results should be resolved by retesting in other SARS-CoV-2 serologic assays, if necessary with different test design and antigen coating than the initial test, and/or confirmatory testing.
(11): In case of combination tests, 200 per marker IgM and IgA.
(12): The manufacturer should provide a justification of the suitability and timing for sensitivity evaluation of IgM and IgA in vaccinated individuals.
(13): Clarification of false-positive results may additionally include testing for presence of other anti-SARS-CoV-2 antibody types (IgA, IgG, total antibody).
(14): E.g. immunoblot providing antigens different from those used in the initial antibody test.
(15): Negative specimens should be from individuals with no history of SARS-CoV-2 infection (if available pre-pandemic).
(16): If the device is intended to be used for more than one specimen type, 100 samples shall be required for each specimen type. If this is not possible in exceptional circumstances (e.g. if specimen collection is very invasive), the manufacturer shall provide a justification and evidence of matrix equivalence.
(17): Sampling should be matched for antigen and NAT testing, e.g., two simultaneous samples from each individual or optimally NAT- and antigen testing from the same sample (e.g. from the eluate of one swab); the buffer/transport medium should be compatible for both NAT and antigen testing; any volume change in the buffer/medium for sample uptake different from that of the proprietary assay, and/or between antigen and NAT test should be clearly communicated.
(18): Or time of infection, if known, taking into account the incubation time.
(19): I.e., without preselection; the viral loads and their distribution should be shown, e.g. characterized by Ct-values of RT-PCR; or transformed into viral load per ml or sample, if applicable.
(20): Depending on the design of the device and nature of the genetic variant. For the purpose of evaluation, at least 3 samples should be represented for each genetic variant.
(21): Specimen collection and extraction items such as swabs, extraction buffers, etc., should be part of the evaluation. If proprietary sampling/sample preparation is not included in the test kit, test performance should be investigated for an applicable range of sampling devices. If the sample is not tested immediately, e.g. after a certain transport time, stability of the antigen should be investigated.
(22): Other than rapid tests, i.e. formal laboratory-based assays e.g. enzyme immunoassay, automated tests, etc.
(23): The sensitivity of ≥80%, ≥85% respectively, should be for all specimen types claimed. All claimed specimen types should be compared with paired NAT results from nasopharyngeal specimens.
(24): The relationship between antigen test performance and NAT should be demonstrated; sensitivity may be shown relating to different viral load ranges and to the threshold of infectivity. The NAT and extraction method used should be described.
(25): Unless there is an available international standard, analytical sensitivity may be tested by dilution series of in-house virus preparations, comparatively with other antigen tests and NAT; if inactivated virus is used, the effect of inactivation and freeze/thawing on the antigen should be investigated.
(26): E.g. staphylococci and streptococci expressing protein A or G.
(27): If the device is intended to be used for more than one specimen type, 100 samples should be required for each specimen type. If this is not possible in exceptional circumstances (e.g. if specimen collection is very invasive), the manufacturer should provide a justification and evidence of matrix equivalence.
(28): The manufacturer should define frequency and document evidence of regular surveillance checks against updated data bank entries in a post-market performance follow-up plan and report.
(29): It is assumed that the underlying performance of the self-test has already been previously demonstrated with the evaluation/assessment of a professional test of the same design as the respective self-test under evaluation. In case for the self-use specimens in question there is no corresponding professional test variant, comparison should be made with the standard specimen type (e.g. nasopharyngeal swabs for antigen test, serum or plasma for antibody test) of the corresponding professional test.
(30): For each self-use specimen type claimed with the device (e.g. nasal, sputum, saliva, whole blood, etc.).
(31): Using whenever possible the original natural matrix of the respective specimen type.
(32): A higher proportion of the samples should be in the weak-positive range close to the cutoff or LoD of the test.
(33): Individuals unaware of the professional diagnostic result prior to self-testing, and performing the entire test procedure from specimen collection and specimen pre-treatment (swab, buffer extraction, etc.) to reading.
(34): Subjects up to about 7 days after symptom onset.
(35): It is assumed that the underlying performance of the self-test has already been previously demonstrated with the evaluation/assessment of a professional test of the same design as the respective self-test under evaluation. In case for the self-use specimens in question there is no corresponding professional test variant, comparison should be made with the standard specimen type (e.g. nasopharyngeal swabs for antigen test, serum or plasma for antibody test) of the corresponding professional test.
(36): For each self-use specimen type claimed with the device (e.g. nasal, sputum, saliva, whole blood, etc.).
(37): Using whenever possible the original natural matrix of the respective specimen type.
(38): A higher proportion of the samples should be in the weak-positive range close to the cutoff or LoD of the test.
(39): Individuals unaware of the professional diagnostic result prior to self-testing, and performing the entire test procedure from specimen collection and specimen pre-treatment (swab, buffer extraction, etc.) to reading.
Revision History
Redline Version