One of the main objectives of the ExaMode project co-financed by the European Commission under the Horizon 2020 framework, is to provide automatic and semi-automatic methods to improve the efficiency and the effectiveness of the diagnoses in the pathology domain with the positive effect of reducing the pathologists' workload.
ExaMode focuses on histopathological diagnosis of tissues with the aim of detecting diseases. We take into account the future cancer incidence and mortality burden worldwide which is predicted to be increasing (by 63% from 2018 until 2040); hence, we decided to focus our attention mainly on four use-cases:
Computer-aided diagnosis tools today are mostly based on data-hungry prediction algorithms. In this context data is typically composed of annotated WSIs (Whole Slide Images) for colon biopsy samples. Nevertheless, the annotation process is expensive and time-consuming. An alternative is to automatically annotate WSIs by using the medical reports related to them. ExaMode is working on automatic methods to extract key pathological concepts from the medical reports to (weakly) annotate the WSIs to be used to train prediction algorithms.
Hence, prediction algorithms can be fed by strongly (manually) annotated data as well as by weakly annotated data. The present ontology is useful in both cases by providing a common ground for identifying key concepts and the algorithms' consistent terminology.
The ExaMode ontology defines the key concepts and properties to model the diagnosis of the cases of the considered diseases, the anatomical location where the disease might be located, and the procedure employed to obtain the tissue be analyzed and the tests conducted on the tissue itself.
Despite the existence of many medical ontologies focusing specifically on cancer, there is not an ontology comprehensively modeling all the diseases related to the cases mentioned above, their anatomical location, topography, and pathology laboratory process. The ExaMode ontology has been built upon many existing and widely-used ontologies and adds all the missing classes and relationships to make them seamlessly work together. Hence, the ExaMode ontology defines its classes and terms when they are not available in any other publicly well-known ontology.
In the following paragraphs, we better describe the four diseases covered by this ontology, their current scientific relevance, how they impact the well-being of the patients, and how the AI systems built using ExaMode can help prevent and help to diagnose them.
The estimated number of colon cancer incidence from 2018 to 2040 is going to increase by up 75%, for both sexes and all ages. The American Cancer Society (ACS) recommends regular screening for colon cancer for people over 45 years. The screening can be done either with a stool-based molecular tests or with a visual exams, so the at this stage the screening process does not include histopathological examination.
However, with the increase number of screenings, the number of cases that need further investigation or confirmation of initial findings by histopathological analysis will raise. Therefore, the colon cancer-associated workload for histopathologists will constantly and significantly increase in the next years. Cancer detection in biopsies is not very difficult for pathologists, but given the large amount of tested samples, it is very time-consuming and has a substantial impact on the histopathologist workload.
From the scientific point of view, computer-aided colon cancer diagnosis appears to be very interesting, as suggested by the number of scientific articles available in the largest research publication database. The number of cases is large and increasing, screenings are becoming more popular, and therefore the number of histopathological analyses is also increasing.
Cervical cancer is the fourth most common cancer in women, and the eight most commonly occurring cancer overall. The estimated number of cervical cancer cases is predicted to increase by 27% until 2040, regardless of the age.
Nearly all cases of uterine cervix cancer are associated with human papillomavirus (HPV) [An et al., 2005; Kurman et al., 2014]. Although there are currently vaccines that can protect against high-cancer risk types of HPV, significantly reducing the risk of cervical cancer, these vaccines are not commonly available to low- and middle-income countries, where, according to the WHO, approximatively 90% of deaths from cervical cancer occur.
There is a great demand for histopathologists who can provide the diagnosis in these countries, and their current number is not sufficient, especially in remote locations. An algorithm-based software system that could support pathologists' work in such countries would thus be very beneficial.
Even though the screening tests for cervical cancer include mainly the Pap Smear Test (Papanicolaou test) and the colposcopic examination of the cervix, the ultimate diagnosis of ambiguous and suspicious cases requires standard histopathological assessments. Algorithmic solutions can thus also help facilitate the diagnosis of cervical cancer as a final part of the screening procedure.
Taken into account the large number of deaths caused by lung cancer, this disease is perceived as a severe problem. However, some people with early-stage lung cancer can be successfully treated. Thus, it is essential to perform screenings to find cancerous lesions at an earlier stage, before they have spread, and when they are largely treatable. The most common type of screening for lung cancer is regular chest x-rays and low dose computed tomography scans (LDTC).
The screening is currently a recommended measure that should be taken to lower the risk of dying from cancer, whether it is lung cancer, colon, or cervical cancer (or others). Therefore, the histopathological diagnosis will become more and more often a final examination undertaken to unequivocally discriminant between cancerous and non-cancerous lesions found during the screening. As of today, there are no official recommendations for a screening program for lung cancer in Europe. However, the American Cancer Society recommends yearly lung cancer screening with LDCT for people with high lung cancer risk. These recommendations are the result of the National Lung Screening Trial, NLST, performed between 2002 and 2010 in the USA.
There are two main types of lung cancer: about 80-85% of lung cancers are non-small cell lung cancer (NSCLC), and about 10-15% are small cell lung cancer (SCLC). Therefore, the most common type of lung cancer is NSCLC, and its subtypes include squamous cell (epidermoid) carcinoma (25-30%), adenocarcinoma (40%), and large cell carcinoma (10-15%). The diagnosis between lung adenocarcinoma and squamous cell lung cancer is difficult, but it is important for further treatment. For the therapy, additional molecular tests are often needed, which cost money and require more samples to be taken during the biopsy. Therefore, an algorithm helping to distinguish between adenocarcinoma and squamous cell lung cancer could save time, biopsied tissues, and money.
The list of the diseases included in the ExaMode priority list also includes a non-cancerous illness, namely the coeliac disease (CD). CD is an immune-mediated disease, with the chronic outcome and genetic predisposition to an intolerance to gluten and its proteins. It is a serious autoimmune, genetic disease where gluten ingestion leads to chronic inflammation, alterations, and damage in the small intestinal mucosa. It is estimated to affect 1 in 100 people worldwide and its prevalence has significantly increased over the past 20 years [Lohi et al., 2007]. The increase in the number of new cases is partly due to better diagnostics and screening of individuals at high risk for the disorder [Marsh, 1992]. However, it is estimated that there are far more undiagnosed cases of coeliac disease than undiagnosed ones [Fasano et al., 2003].
In general, routine screening for coeliac disease is not carried out. Testing is usually only recommended for people at a higher risk of developing this disease, such as those with a family history of the condition. In adults and children, the diagnosis of the coeliac disease relies mainly on the presence of positive coeliac disease-specific autoantibodies and further diagnostic small intestine biopsies [Fasano et al., 2001]. Intestinal biopsies are always necessary if the antibodies are low or negative, and if there are no signs/ symptoms of malabsorption. A second biopsy may be necessary if there is no clinical improvement after shifting to a strict gluten-free diet. Another biopsy is sometimes recommended in the follow-up period. Finding in the bioptic samples are characteristic, not specific, so it might be challenging for an unexperienced pathologist (and for an algorithm) to diagnose coeliac disease correctly.
Considering an increasing number of new cases of coeliac disease and an important role of small intestine biopsies in the diagnosis, the market size for the exploitation of product prototypes developed in ExaMode seems very promising.
|[Ontology NS Prefix]||<https://w3id.org/examode/ontology#>|
The ultimate aim of the analysis performed by a histopathologist is to complete the clinical report. A histopathological clinical report is a document that contains the results of a series of measurements and analyses performed on specific cells or tissues in order to:
The goal of this document is to define an OWL 2 ontology for the ExaMode project whose overall aim is to build predictive algorithms to help pathologist in the diagnosis of cancer cases. The starting point of ExaMode are medical diagnostic reports associated with WSIs of examined tissues.
The present ontology models the diagnostic reports associated to a (series of) WSI and enable a structured encoding of the main concepts of a diagnosis. These concepts and their relations can be used to automatically annotate WSI as well as to do some reasoning over diagnostic reports about four cases considered in the ontology.
In the following sections we provide essential information required to understand the reports of each of the four cases. Subsequently, we describe the the ontology in its different components.
Therefore, good endoscopic practice together with an accurate histopathological diagnosis decreases the incidence of colorectal cancer. There are different precursor lesions with different diagnostic and prognostic significance [Zauber et al., 2012]. In this ontology we focus on the most important features and measurements for polyps. Considering all these aspects, in the microscopic analysis of colon excisional biopsy sample, there is a minimum of data that need to be provided by the pathologist:
This information is a prognostic factor leading patient's management: polyps with a negative polypectomy margin, low grade histology, and no lymphovascular invasion can be safely treated with endoscopic polypectomy, whereas positive margin, high grade (poorly differentiated) histology, and lymphovascular invasion are associated with an increased risk of adverse outcomes and surgical resection is indicated [Butte et al., 2012].
The cervical biopsy (colposcopy) is a procedure made when previous tests provide evidence of precancerous/abnormal or neoplastic lesions in uterine cervix. The cervical tissue removed has to be analyzed by an expert pathologist to identify if the tumor lesions are present or not. In this case, the pathology report provides not only the diagnostic information, but it is also a prognostic tool for the patient’s treatment. Colposcopy with directed biopsy is currently the “gold standard” for the diagnosis of cervical precancer.
Thus, the aim of the histopathologist is to recognize and identify these precursor lesions well known as Cervical Intraepithelial Neoplasia (CIN), which displays proliferation of atypical basaloid cells [Lax, 2011]. Based on proliferation spread, WHO classification categorizes this dysplasia into three grades:
CIN1 corresponds to Low-Grade Squamous Intraepithelial Lesion (LSIL), whereas CIN2-3 correspond to High-Grade Squamous Intraepithelial Lesion (HSIL). A strong association between these precursor lesions and HPV infection has been investigated, where LSIL is strongly associated with lowintermediate risk HPV, and HSIL is associated to high risk HPV. Therefore, the first feature that has to be identified and reported is the presence and the grade of dysplasia with possible HPV association.In the presence of cervical carcinoma we identify main microscopic features and measurements of uterine cervix colposcopy biopsy, which should be provided in the pathology report:
Also, the immunohistochemistry (p16 and Ki-67 staining) assists in the histological differential diagnosis of precursors to reactive and metaplastic epithelium. For invasive cervical carcinoma, stage is the strongest prognostic factor [Lax, 2011]. Therefore, both the information retrieved from the sample stained with H&E, as well as from the sample stained immunohistochemically is important for the diagnosis and the management of a patient.
Lung cancer (LC) has a high mortality rate and is the most common cause of cancer death worldwide, accounting for 19.4% of cancer-related deaths . The average overall survival rate for metastatic lung cancer is very low, whereas early stage has higher survival rates. The treatment of low-stage LC is complete surgical resection. Instead, for metastatic LC the surgical option is often impossible. An accurate diagnosis from lung biopsies targets the most correct prognostic and therapeutic management of the patients.
Moreover, a correct WHO classification is very important for metastatic tumors since there is therapeutic implication of distinguishing histological subtypes such as adenocarcinoma and squamous cell carcinoma. The identification of new therapeutic targets over the past decade resulted in an urgent need for a classification system for both non-resection specimens (particularly small biopsies) and cytology samples. For this reason an accurate and specific pathology report is important to establish diagnosis and patient’s treatment.
Starting from the analysis of lung biopsies, microscopic analysis section of the clinical report on lung cancer biopsy sample must provide the following information, with prognostic and predictive implications:
The list of the diseases included in the ExaMode priority list includes also a non-cancerous disease, namely the celiac disease (CD).
CD is an immune-mediated disease with chronic outcome and genetic predisposition to an intolerance to gluten and its proteins. This intolerance leads to abnormal immune response, followed by a chronic inflammation and alteration of the small intestinal mucosa. The diagnosis of this pathology is based on the description of the histopathological alterations of small intestine (after duodenal biopsy) by expert pathologists [GIPAD, 2011].
Microscopic analysis of small colon biopsy sample for celiac disease provides information about:
We represent the ontology as a graph where nodes are classes and edges are typed relationships amongst the classes. Classes (nodes) represent real-world objects such as a person, a project, a tissue or an anatomical part. Relationships (edges) describe how classes interact one with the other. In ExaMode we limit the creation of new classes to a minimum and we re-use existing ontologies as much as possible.
The EXAMODE ontology describes the medical reports and it is organized in five main conceptual areas:
In the following sections we discuss in detail the semantic areas, dividing them by disease when necessary.
The central class of this part of the ontology is Clinical Case Report. All the medical reports modeled through this ontology are part of this class that can be instantiated for specific cases such as the Colon Clinical Case Report. Each clinical case is associated to one Disease, a general class which represent one disease. In this ontology, more specifically, a Disease is then instantiated in one of the four diseases considered in Examode, that is Cervical Cancer, Lung Cancer, Colon Carcinoma, and Coeliac Disease. The image above represents a case of Colon Clinical Case Report, thus the class is associated with the Colon Carcinoma disease through the isAboutDisease relationship. Each Clinical Case Report presents a unique identifier, obtained through the clinical cases, and a diagnosis, i.e. the text in the "Diagnosis" field found in the reports. Each medical report can also be related to an image file about the report. The block number refers to internal ids related to the reports or the images.
All medical reports are associated with a Patient (normally anonymized), since a single patient can have more than one associated medical report. We model minimal patient information: age at the time of the report, gender and an age onset to classify the patients into three categories: young adult, middle age and late.
All medical reports are also associated with the Organization that produced it. In EXAMODE there are two organizations: the Cannizzaro Hospital (AOEC) in Italy and the Radboud Medical Center in the Netherlands.
The diagnosis is the central area of the EXAMODE ontology. Even though the structure of this semantic area differs from disease to disease, certain classes are shared. In particular, each medical report is connected to an Outcome. The Outcome class represents a general outcome that be positive (PositiveOutcome), negative (NegativeOutcome), without the evidence of malignancy (NoMalignancy), and inconclusive (InconclusiveOutcome) that, in turns, can be due to the presence of insufficient material (InsufficientMaterial) to reach a conclusion or to an unsatisfactory specimen for a diagnosis (SpecimenUnsatisfactory).
The positive outcome for the colon cancer is specialized into a taxonomy of states ranging from benign to malignant states.
The main task for a pathologist is to detect cancerous polyps (e.g. for population screening). The high objectiveness of the diagnostic criteria are:
The majority of colon-rectal cancers derives from precursor lesions which can be identified using endoscopic procedure (Colonoscopy), leading to excision of these lesions, well known as polyps. Therefore, good endoscopic practice together with an accurate histopathological diagnosis decreases the incidence of colorectal cancer.
There are different precursor lesions with different diagnostic and prognostic significance. In this ontology, we focus on the most important features and measurements for polyps. Considering all these aspects, in the microscopic analysis of colon excisional biopsy sample, there is a minimum of data that need to be provided by the pathologist:
Therefore, when a patient is found positive, the diagnosis can be one among the subclasses of the class Positive Outcome in the figure above. When the outcome is an instance of the class Polyp of Colon, or one of its subclasses, it can be "annotated" with a form of Dysplasia. In this particular case, the ontology contains classes, such as Moderate Colon Dysplasia, that specifically refer to a dysplasia of the colon.
Classes such as Colon Dysplasia or High Grade Dysplasia are part of the Annotation conceptual area. These classes are used as additional information that may or may not be found through the diagnosis.
The figure above represents the structure of the ontology for the Uterine Cervix Cancer. In the case of the Cervix, the outcome, indipendently of its type, may or may not be annotated with the presence of Human Papilloma Virus Infection. The possible positive outcome here include cervical polyp, cervicitis, one of the possible types of Cervical Intraepithaelial neoplasia (CIN), or one of the possible types of cervical carcinoma. We note that a case of cervix cancer does not present possibilities for annotations.
The figure above describe the structure of the diagnosis area for the lung cancer. In this case, the patient, when positive, may present Sarcoidosis, Lymphadenitis or Lung Carcinoma (and its possible subclasses). In the case of Lung Carcinoma, there may be the presence of Necrosis. We note that a case of lung cancer does not present possibilities for annotations.
For the diagnosis of the coeliac disease, as described in the image above, the outcome is fairly simpler that the one of the other three diseases: if the patient is potivite, s/he can either have the coeliac disease or Duodenitis (an inflammation which does not necessarily imply coeliac disease). The outcome may be correlated with the information about the Immunohistochemical Test carried on the patient.
In the figure we also reported the annotations that may correlate a clinical case and that are derived from the information present in the "Diagnosis" field of the clinical reports. These information help to understand if the patient presents the disease, and they are specific to each report. We inferred a sub-classification of areas for this conceptual area.
As shown above, the Examode Ontology presents other two main areas, here presented for the case of the colon cancer, i.e. the Procedure and the Anatomical Location. The former details the surgical procedure performed to collect the tissue; the latter represents an anatomical location, that can be both the area from where the tissue was taken, or where the disease is located in the patient.
In the case of the colon, the performed procedure can only be a surgical procedure which, in turn, can be a resection, a anastomosis, a hemiectomy or a form of Endoscopic Biopsy.
The contemplated anatomical location are the different areas of the colon. The class Colon, NOS (Not Otherwise Specified) describes a general area of the colon, and can be better specified by one of its subclasses. Other locations included in this area that are not in the Colon are the Rectum, the Abdomen and the Ileum.
In this case, the possible procedures include the cervical biopsy, an hysterectomy (and its specifications), and different types of surgical procedures, including conization and endocervical curettage.
The locations where the cancer can be located and the tissues be withdrawn include the uterus, the cervix epitelium and its different parts, and the cervical mucus.
In the case of the lung cancer, the possible procedures include different types of Biopsies. It is also possible to perform different types of immunohistochemical tests, that can usually return a true/false result or a numerical one, in the case of the test on the proliferation marker protein Ki-67.
The locations for the lung cancer include the whole lung, the bronchus and their parts, the mediastinum and the Thoracic lymph nodes.
In the case of the coeliac disease the surgical procedure can be a form of biopsy, performed in different locations (a biopsy in the greater curvature, a biopsy in the pyloric antrum and one in the duodenum). As such, these cases are naturally connected to their locations.
The anatomical locations for the coeliac disease include the Duodenum and its different parts, where the disease is located. Since the biopsy that allows to diagnose the disease can also be performed in the pyloric antrum and the greater curvature, they are included among the locations.
has characteristics: functional
[An et al., 2005] H.J. An, K.R. Kim, I.S. Kim, D.W. Kim, M.H. Park, I.A. Park, K.S. Suh, E.J. Seo, S.H. Sung, J.H. Sohn, H.K. Yoon, E.D. Chang, H.I. Cho, J.Y. Han, S.R. Hong and Ahn GH (2005). Prevalence of human papillomavirus DNA in various histological subtypes of cervical adenocarcinoma: a population-based study. Mod Pathol 18(4):528-534.
[Kurman et al., 2014] R.J. Kurman, M.L. Carcangiu, C.S. Herrington and R.H. Young (2014). WHO classification of tumours of the female reproductive organs. IARC press, Lyon.
[Lohi et al., 2007] S. Lohi, K. Mustalahti, K. Kaukinen, K. Laurila, P. Collin, H. Rissanen, et al. Increasing prevalence of coeliac disease over time. Aliment Pharmacol Ther 2007;26(9):1217–25.
[Marsh et al., 1992] M.N. Marsh. Gluten, major histocompatibility complex, and the small intestine. A molecular and immunobiologic approach to the spectrum of gluten sensitivity (“celiac sprue”). Gastroenterology 1992;102(1):330–54.
[Fasano et al., 2003] A. Fasano, I. Berti, T. Gerarduzzi, T. Not, R.B. Colletti, S. Drago, et al. Prevalence of celiac disease in at-risk and not-at-risk groups in the United States: a large multicenter study. Arch Intern Med 2003;163(3):286–92.
[Fasano et al., 2001] A. Fasano, C. Catassi. Current approaches to diagnosis and treatment of celiac disease: an evolving spectrum. Gastroenterology 2001;120(3):636–51.
[Zauber et al., 2011] A.G. Zauber, S.J. Winawer, M.J. O’Brien, et al. Colonoscopic polypectomy and long-term prevention of colorectalcancer deaths. 2011. N Engl J Med 2012;366:687-96.
[Butte et al., 2012] J.M. Butte, P. Tang, M. Gonen, et al. Rate of residual disease after complete endoscopic resection of malignant colonic polyp. Dis Colon Rectum 2012;55:122-7.
[Lax, 2011] S. Lax. Histopathology of cervical precursor lesions and cancer. Acta Dermatovenerol Alp Pannonica Adriat. 2011 Sep;20(3):125-33.
[Travis et al., 2011] W.D. Travis, E. Brambilla and M. Noguchi et al. International Association for the Study of Lung cancer/American Thoracic Society/European Respiratory Society international multidisciplinary classification of lung adenocarcinoma. J Thorac Oncol 2011, 6:244-285.
[GIPAD, 2011] Coeliac Disease: The Histology Report (2011) On behalf of the “Gruppo Italiano Patologi Apparato Digerente (GIPAD)” and of the “Società Italiana di Anatomia Patologica e Citopatologia Diagnostica”/International Academy of Pathology, Italian division (SIAPEC/IAP)
The authors would like to thank Silvio Peroni for developing LODE, a Live OWL Documentation Environment, which is used for representing the Cross Referencing Section of this document and Daniel Garijo for developing Widoco, the program used to create the template used in this documentation.