<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">

</head>

<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space; font-family: Calibri, sans-serif; font-size: 14px; color: rgb(0, 0, 0);">

<div>PBL wrote:</div>

<div><br>

</div>

<div><i>“How would you go about writing a requirements specification for this software?”</i></div>

<div><br>

</div>

<div>First, I would not consider what you are asking about to be functional requirements, so they would not be appropriate to even try to express in a Semantic Model. Semantic models are for the functional requirements only. Instead, I would categorize what

 you are asking about to be in what I call a “Quality of Service” subset of the nonfunctional requirements. These should then, IMHO, be expressed in an entirely different form.</div>

<div><br>

</div>

<div>Second, I think I understand what direction you are trying to go with this, but I suggest that you are still quite far away from actually getting to the point of true requirements. Specifically, assuming that a goal of this software is to answer the question,

 “Given a CRT scan, does this patient have myocarditis?”, a number of critical questions need to be answered. For example:</div>

<div><br>

</div>

<div>*) What is the maximum acceptable false positive rate for the software's diagnosis of myocarditis when the patient doesn’t actually have it?</div>

<div>*) What is the maximum acceptable false negative rate for the software's diagnosis of no myocarditis when the patient does actually have it?</div>

<div>*) What is the maximum acceptable rate of “Not sure” diagnoses from the software?</div>

<div>*) What is the maximum acceptable false positive rate for an experienced physician's diagnosis of myocarditis when the patient doesn’t actually have it?</div>

<div>*) What is the maximum acceptable false negative rate for an experienced physician's diagnosis of no myocarditis when the patient does actually have it?</div>

<div>*) What is the maximum acceptable rate of "Not sure" diagnoses for experienced physicians?</div>

<div>*) Is there any rational justification for the software to perform at a different level than experienced physicians?</div>

<div>*) What is an maximum acceptable rate of disagreement between the software’s diagnosis and an experienced physician’s diagnosis of the same CRT scan?</div>

<div>*) What about a CRT scan where two or more experienced physicians would have conflicting diagnoses, what is the software expected to do?</div>

<div><br>

</div>

<div>The core issue surrounds how well the software correctly diagnoses myocarditis, so the requirements need to be expressed around this “Quality of Service”. Purely for sake of argument, assume that:</div>

<div><br>

</div>

<div>

<div>*) The maximum acceptable false positive rate for an experienced physician's diagnosis of myocarditis when the patient doesn’t actually have it is 0.5%</div>

<div>*) The maximum acceptable false negative rate for an experienced physician's diagnosis of no myocarditis when the patient does actually have it is 0.05%</div>

</div>

<div>*) The maximum acceptable rate of “Not sure” diagnoses for experienced physicians is 2%</div>

<div><br>

</div>

<div>So we might propose the software requirements to be something like this:</div>

<div><br>

</div>

<div>

<div>*) The maximum acceptable false positive rate for the software's diagnosis of myocarditis when the patient doesn’t actually have it shall be 0.5%</div>

<div>*) The maximum acceptable false negative rate for the software's diagnosis of no myocarditis when the patient does actually have it shall be 0.05%</div>

</div>

<div>*) The maximum acceptable rate of “Not sure” diagnoses for the software shall be 2%</div>

<div>*) Given a single CRT scan where any odd number of at least three experienced physicians have conflicting diagnoses, the software’s diagnosis shall be the same as the experienced physician’s majority diagnosis at least 95% of the time</div>

<div><br>

</div>

<div>In other words, this is constraining the software to:</div>

<div><br>

</div>

<div>A) Perform at least well as experienced physicians, and</div>

<div>B) Exhibit a minimum level of agreement with the majority of experienced physicians when those experienced physicians would disagree on the same CRT scan</div>

<div><br>

</div>

<div><br>

</div>

<div><i>“How would you go about validating the software against the requirements specification?”</i></div>

<div><br>

</div>

<div>Do you really mean “<u>verifying</u> the software against the requirements specification”? If so, then I would suggest that you need to start with a sufficiently large set of test case CRT scans which have accompanying diagnoses from experienced physicians.

 This should include multiple CRT scans in each of the following categories:</div>

<div><br>

</div>

<div>*) A sufficient number of experienced physicians all agree that the patient does have myocarditis</div>

<div>*) A sufficient number of experienced physicians all agree that the patient does not have myocarditis</div>

<div>*) A sufficient number of experienced physicians all agree that the patient cannot be properly diagnosed from that CRT scan</div>

<div>*) A simple majority of a sufficient number of experienced physicians, but not all of them, agree that the patient does have myocarditis</div>

<div>*) A simple majority of a sufficient number of experienced physicians, but not all of them, agree that the patient does not have myocarditis</div>

<div>*) A simple majority of a sufficient number of experienced physicians, but not all of them, agree that the patient cannot be properly diagnosed from that CRT scan</div>

<div><br>

</div>

<div>Further, not a single one of the above test case CRT scans above are allowed to also be in the data set that was used to train the DLNN.</div>

<div><br>

</div>

<div>There is necessarily a lot of statistics that is well beyond my pay grade to determine what is a “sufficiently large set of CRT scans”, how many CRT scans are needed in each of the categories, and what “a sufficient number of experienced physicians” is.

 But I expect we should be able to get statistically significant verification that the software is performing at or above each of the required Quality of Service levels.</div>

<div><br>

</div>

<div><br>

</div>

<div><br>

</div>

<div>Cheers,</div>

<div><br>

</div>

<div>— steve</div>

<div><br>

</div>

<div><br>

</div>

<div><br>

</div>

<div>-----Original Message-----</div>

<div>From: systemsafety <<a href="mailto:systemsafety-bounces@lists.techfak.uni-bielefeld.de">systemsafety-bounces@lists.techfak.uni-bielefeld.de</a>> on behalf of Peter Bernard Ladkin <<a href="mailto:ladkin@causalis.com">ladkin@causalis.com</a>></div>

<div>Organization: RVS Bielefeld and Causalis</div>

<div>Date: Wednesday, April 28, 2021 at 10:24 AM</div>

<div>To: "<a href="mailto:systemsafety@lists.techfak.uni-bielefeld.de">systemsafety@lists.techfak.uni-bielefeld.de</a>" <<a href="mailto:systemsafety@lists.techfak.uni-bielefeld.de">systemsafety@lists.techfak.uni-bielefeld.de</a>></div>

<div>Subject: Re: [SystemSafety] [External] Re: Post Office Horizon System</div>

<div><br>

</div>

<div>Steve,</div>

<div><br>

</div>

<div>I am certainly not going to suggest that many software functional requirements could be more

</div>

<div>carefully specified than they are, and, like you, I believe that in many software developments such

</div>

<div>precise requirements specification can help enormously.</div>

<div><br>

</div>

<div>Suppose you were writing software to look at CRT scans of people's hearts, and identify myocarditis.

</div>

<div>There is (a) a software component which maps pixels to anatomical objects </div>

<div>+ geometry, followed by </div>

<div>(b) a software interpretive component which identifies certain kinds of anomalies in the picture

</div>

<div>overlaid with the anatomy derived from (a).</div>

<div><br>

</div>

<div>There are two related criteria for the success of this software. The main </div>

<div>criterion is that the </div>

<div>subject really does have myocarditis. The second criterion is that the software judgement agrees

</div>

<div>with the judgement of an experienced physician that the subject has myocarditis. Most often, it is

</div>

<div>the second criterion which is used to determine success, since the first can only be determined with

</div>

<div>invasive and medically undesirable procedures.</div>

<div><br>

</div>

<div>Tasks (a) and (b) are usually undertaken a DLNN. How would you go about writing a requirements

</div>

<div>specification for this software? How would you go about validating the software against the

</div>

<div>requirements specification?</div>

<div><br>

</div>

<div>PBL</div>

<div><br>

</div>

<div>Prof. Peter Bernard Ladkin, Bielefeld, Germany</div>

<div>ClaireTheWhiteRabbit RIP</div>

<div>Tel+msg +49 (0)521 880 7319  www.rvs-bi.de</div>

<div><br>

</div>

<div><br>

</div>

<div><br>

</div>

<div><br>

</div>

<div><br>

</div>

<div><br>

</div>

</body>

</html>