The ExaMode ontology was developed in the context of the ExaMode project co-financed by the European Commission under the Horizon 2020 framework, which aims to to provide automatic and semi-automatic methods to improve the efficiency and the effectiveness of the diagnoses in the pathology domain with the positive effect of reducing the pathologists' workload.
The ExaMode project focuses on histopathological diagnosis of tissues with the aim of detecting diseases. We take into account the future cancer incidence and mortality burden worldwide which is predicted to be increasing (by 63% from 2018 until 2040); hence, we focus on four largely diffused and studied diseases:
Computer-aided diagnosis tools today are mostly based on data-hungry prediction algorithms. In this context data is typically composed of annotated WSIs (Whole Slide Images) for colon biopsy samples. Nevertheless, the annotation process is expensive and time-consuming. An alternative is to automatically annotate WSIs by using the medical reports related to them. ExaMode is working on automatic methods to extract key pathological concepts from the medical reports to (weakly) annotate the WSIs to be used to train prediction algorithms.
Hence, prediction algorithms can be fed by strongly (manually) annotated data as well as by weakly annotated data. The present ontology is useful in both cases by providing a common ground for identifying key concepts and the algorithms' consistent terminology.
The ExaMode ontology models the diagnosis process using WSIs in histopathology, defining the key concepts and properties to model diagnoses, the anatomical location where the disease might be located, and the procedure employed to obtain the tissue be analyzed and the tests conducted on the tissue itself. Compared to previous efforts [Serra et al., 2019; Gurcan et al., 2017], the ExaMode ontology focuses on diagnosing histopathology exams, defining components related to the annotation process of WSIs. The ontology is multilingual since components are labeled in three different languages: English, Italian, and Dutch. It comprises five semantic areas grouping components related to the same aspect of the diagnostic process: clinical case reports (i.e., general aspects), diagnosis results, other tests performed, interventions or surgical procedures employed to retrieve the specimen, and the anatomical location of the specimen. This classification provides an ontological template that can be used to model any disease in the histopathology domain. We modeled four largely diffused and studied diseases: colon cancer, uterine cervix cancer, lung cancer, and celiac disease. Thanks to the modularity of this approach, we can easily expand our ontology to include more use cases.
The ontology design followed a bottom-up approach starting from anonymized clinical reports about the four considered diseases provided by Azienda Ospedaliera per l’Emergenza Cannizzaro (AOEC) in Italy and the Radboud University Medical Center (RUMC) in The Netherlands. We analyzed these textual records and worked together with the pathologists and physicians, employing a co-design methodology to accurately identify the classes and relations to be included in the ontology. The ExaMode ontology is designed to meet the OBO principles to ease interoperability with other biomedical ontologies. Hence, we maximized the reuse of concepts defined in already available and well-known biomedical ontologies and vocabularies, thus limiting the creation of new classes and relations to a minimum.
In the following paragraphs, we better describe the four diseases covered by this ontology, their current scientific relevance, how they impact the well-being of the patients, and how the AI systems built using ExaMode can help prevent and help to diagnose them.
The estimated number of colon cancer incidence from today to 2040 will increase by up to 75%, for both sexes and all ages. The American Cancer Society (ACS) recommends regular screening for colon cancer for people over 45 years. The screening can be done either with a stool-based molecular tests or with a visual exams, so the at this stage the screening process does not include histopathological examination.
However, with the increase number of screenings, the number of cases that need further investigation or confirmation of initial findings by histopathological analysis will raise. Therefore, the colon cancer-associated workload for histopathologists will constantly and significantly increase in the next years. Cancer detection in biopsies is not very difficult for pathologists, but given the large amount of tested samples, it is very time-consuming and has a substantial impact on the histopathologist workload.
From the scientific point of view, computer-aided colon cancer diagnosis appears to be very interesting, as suggested by the number of scientific articles available in the largest research publication database. The number of cases is large and increasing, screenings are becoming more popular, and therefore the number of histopathological analyses is also increasing.
Cervical cancer is the fourth most common cancer in women, and the eight most commonly occurring cancer overall. The estimated number of cervical cancer cases is predicted to increase by 27% until 2040, regardless of the age.
Nearly all cases of uterine cervix cancer are associated with human papillomavirus (HPV) [An et al., 2005; Kurman et al., 2014]. Although there are currently vaccines that can protect against high-cancer risk types of HPV, significantly reducing the risk of cervical cancer, these vaccines are not commonly available to low- and middle-income countries, where, according to the WHO, approximatively 90% of deaths from cervical cancer occur.
There is a great demand for histopathologists who can provide the diagnosis in these countries, and their current number is not sufficient, especially in remote locations. An algorithm-based software system that could support pathologists' work in such countries would thus be very beneficial.
Even though the screening tests for cervical cancer include mainly the Pap Smear Test (Papanicolaou test) and the colposcopic examination of the cervix, the ultimate diagnosis of ambiguous and suspicious cases requires standard histopathological assessments. Algorithmic solutions can thus also help facilitate the diagnosis of cervical cancer as a final part of the screening procedure.
The average survival rate for metastatic lung cancer is very low, whereas early stages have higher survival rates. The treatment of low-stage lung cancer is complete surgical resection. Instead, for metastatic lung cancer, the surgical option is often impossible. An accurate diagnosis from lung biopsies targets the most correct prognostic and therapeutic management for the patient.
There are two main types of lung cancer: about 80-85% of lung cancers are non-small cell lung cancer (NSCLC), and about 10-15% are small cell lung cancer (SCLC). Therefore, the most common type of lung cancer is NSCLC, and its subtypes include squamous cell (epidermoid) carcinoma (25-30%), adenocarcinoma (40%), and large cell carcinoma (10-15%). The diagnosis between lung adenocarcinoma and squamous cell lung cancer is difficult, but it is important for further treatment. For the therapy, additional molecular tests are often needed, which cost money and require more samples to be taken during the biopsy. Therefore, an algorithm helping to distinguish between adenocarcinoma and squamous cell lung cancer could save time, biopsied tissues, and money.
The list of the diseases included in the ExaMode priority list also includes a non-cancerous illness, namely the celiac disease (CD). CD is an immune-mediated disease, with the chronic outcome and genetic predisposition to an intolerance to gluten and its proteins. It is a serious autoimmune, genetic disease where gluten ingestion leads to chronic inflammation, alterations, and damage in the small intestinal mucosa. It is estimated to affect 1 in 100 people worldwide and its prevalence has significantly increased over the past 20 years [Lohi et al., 2007]. The increase in the number of new cases is partly due to better diagnostics and screening of individuals at high risk for the disorder [Marsh, 1992]. However, it is estimated that there are far more undiagnosed cases of celiac disease than undiagnosed ones [Fasano et al., 2003].
In general, routine screening for celiac disease is not carried out. Testing is usually only recommended for people at a higher risk of developing this disease, such as those with a family history of the condition. In adults and children, the diagnosis of the celiac disease relies mainly on the presence of positive celiac disease-specific autoantibodies and further diagnostic small intestine biopsies [Fasano et al., 2001]. Intestinal biopsies are always necessary if the antibodies are low or negative, and if there are no signs/ symptoms of malabsorption. A second biopsy may be necessary if there is no clinical improvement after shifting to a strict gluten-free diet. Another biopsy is sometimes recommended in the follow-up period. Finding in the bioptic samples are characteristic, not specific, so it might be challenging for an unexperienced pathologist (and for an algorithm) to diagnose celiac disease correctly.
Considering an increasing number of new cases of celiac disease and an important role of small intestine biopsies in the diagnosis, the market size for the exploitation of product prototypes developed in ExaMode seems very promising.
|[Ontology NS Prefix]||<https://w3id.org/examode/ontology/>|
The ultimate aim of the analysis performed by a histopathologist is to complete the clinical report. A histopathological clinical report is a document that contains the results of a series of measurements and analyses performed on specific cells or tissues in order to:
The goal of this document is to define an OWL 2 ontology for the ExaMode project whose overall aim is to build predictive algorithms to help pathologist in the diagnosis of cancer cases. The starting point of ExaMode are medical diagnostic reports associated with WSIs of examined tissues.
The present ontology models the diagnostic reports associated to a (series of) WSI and enable a structured encoding of the main concepts of a diagnosis. These concepts and their relations can be used to automatically annotate WSI as well as to do some reasoning over diagnostic reports about four cases considered in the ontology.
In the following sections we provide essential information required to understand the reports of each of the four cases. Subsequently, we describe the the ontology in its different components.
Good endoscopic practice, together with an accurate histopathological diagnosis, decreases the incidence of colorectal cancer. There are different precursor lesions with different diagnostic and prognostic significance [Zauber et al., 2012]. The main task for a pathologist is to detect cancerous polyps (e.g., for population screening) and to identify the answers to these main diagnostic criteria:
This information is a prognostic factor leading to the decision about the patient's management. For example, polyps with a negative polypectomy margin, low-grade histology, and no lymphovascular invasion can be safely treated with endoscopic polypectomy. On the other hand, positive margin, high grade (poorly differentiated) histology, and lymphovascular invasion are associated with an increased risk of adverse outcomes, and surgical resection is indicated [Butte et al., 2012].
The cervical biopsy (colposcopy) is a procedure done when previous tests provide evidence of precancerous/abnormal or neoplastic lesions in the uterine cervix. The cervical tissue removed has to be analyzed by an expert pathologist to identify if the tumor lesions are present or not. If present, the pathology report provides diagnostic information and works as a prognostic tool for the patient’s treatment. Colposcopy with directed biopsy is currently one of the “gold standard” practices for diagnosing cervical pre-cancer.
Thus, the histopathologist aims to recognize and identify these precursor lesions, known as Cervical Intraepithelial Neoplasia (CIN), which displays the proliferation of atypical basaloid cells [Lax, 2011]. Based on proliferation spread, WHO classification categorizes this dysplasia into three grades:
CIN1 corresponds to Low-Grade Squamous Intraepithelial Lesion (LSIL), whereas CIN2-3 correspond to High-Grade Squamous Intraepithelial Lesion (HSIL). A strong association between these precursor lesions and HPV infection has been investigated, where LSIL is strongly associated with lowintermediate risk HPV, and HSIL is associated to high risk HPV. Therefore, the first feature that has to be identified and reported is the presence and the grade of dysplasia with possible HPV association.In the presence of cervical carcinoma we identify main microscopic features and measurements of uterine cervix colposcopy biopsy, which should be provided in the pathology report:
Also, the immunohistochemistry (p16 and Ki-67 staining) assists in the histological differential diagnosis of precursors to reactive and metaplastic epithelium. For invasive cervical carcinoma, stage is the strongest prognostic factor [Lax, 2011].
Lung cancer (LC) has a high mortality rate and is the most common cause of cancer death worldwide, accounting for 19.4% of cancer-related deaths [Travis et al., 2011]. The average overall survival rate for metastatic lung cancer is very low, whereas early stage has higher survival rates. The treatment of low-stage LC is complete surgical resection. Instead, for metastatic LC the surgical option is often impossible. An accurate diagnosis from lung biopsies targets the most correct prognostic and therapeutic management of the patients.
Moreover, a correct WHO classification is very important for metastatic tumors since there is therapeutic implication of distinguishing histological subtypes such as adenocarcinoma and squamous cell carcinoma. The identification of new therapeutic targets over the past decade resulted in an urgent need for a classification system for both non-resection specimens (particularly small biopsies) and cytology samples. For this reason an accurate and specific pathology report is important to establish diagnosis and patient’s treatment.
Starting from the analysis of lung biopsies, microscopic analysis section of the clinical report on lung cancer biopsy sample must provide the following information, with prognostic and predictive implications:
The list of the diseases included in the ExaMode priority list includes also a non-cancerous disease, namely the celiac disease (CD).
Celiac disease is an immune-mediated disease with chronic outcome and genetic predisposition to an intolerance to gluten and its proteins. This intolerance leads to an abnormal immune response, followed by chronic inflammation and alteration of the small intestinal mucosa. The diagnosis of this pathology is based on the description of the histopathological alterations of the small intestine (after duodenal biopsy) by expert pathologists [GIPAD, 2011].
Microscopic analysis of small colon biopsy sample for celiac disease provides information about:
We represent the ontology as a graph where nodes are classes and edges are typed relationships amongst the classes. Classes (nodes) represent real-world objects such as a person, a project, a tissue or an anatomical part. Relationships (edges) describe how classes interact one with the other. In ExaMode we limit the creation of new classes to a minimum and we re-use existing ontologies as much as possible.
The EXAMODE ontology The ExaMode ontology is organized into five semantic areas, each concerning different aspects of the histopathology process.:
In the following sections we discuss in detail the semantic areas, dividing them by disease when necessary.
The central class of this part of the ontology is Clinical Case Report. All the medical reports modeled through this ontology are part of this class that can be instantiated for specific cases such as the Colon Clinical Case Report. Each clinical case is associated to one Disease, a general class which represent one disease. In this ontology, more specifically, a Disease is then instantiated in one of the four diseases considered in Examode, that is Cervical Cancer, Lung Cancer, Colon Carcinoma, and celiac Disease. The image above represents a case of Colon Clinical Case Report, thus the class is associated with the Colon Carcinoma disease through the isAboutDisease relationship. Each Clinical Case Report presents a unique identifier, obtained through the clinical cases, and a diagnosis, i.e. the text in the "Diagnosis" field found in the reports. Each medical report can also be related to an image file about the report. The block number refers to internal ids related to the reports or the images.
All medical reports are associated with a Patient (anonymized), since a single patient can have more than one associated medical report. We model minimal patient information: age at the time of the report, gender and an age onset to classify the patients into three categories: young adult, middle age and late.
All medical reports are also associated with the Organization that produced it. In EXAMODE there are two organizations: the Cannizzaro Hospital (AOEC) in Italy and the Radboud University Medical Center (RUMC) in the Netherlands.
The diagnosis is the central area of the EXAMODE ontology. Even though the structure of this semantic area differs from disease to disease, certain classes are shared. In particular, each medical report is connected to an Outcome. The Outcome class represents a general outcome that be positive (PositiveOutcome), negative (NegativeOutcome), without the evidence of malignancy (NoMalignancy), and inconclusive (InconclusiveOutcome) that, in turns, can be due to the presence of insufficient material (InsufficientMaterial) to reach a conclusion or to an unsatisfactory specimen for a diagnosis (SpecimenUnsatisfactory). Positive outcomes describe the type of cancer or disease that has been diagnosed in the examined specimen and it can be associated with additional information for some types of diseases. For instance, in the colon cancer use case, if the specimen has been diagnosed as "Polyp of Colon" or any of its subclasses, we might have additional information about the degree of dysplasia the polyp presents. We refer to this additional data as "Annotations to the Case" and we modeled them as subclasses of the general-purpose class called "Finding".
The figure above represents the structure of the ontology for the Uterine Cervix Cancer. For cervical cancer, in the diagnosis, indipendently of its type, one could detect the presence of koilocytes, or the specimen could test positive for HPV. We note that a case of cervix cancer does not present possibilities for annotations.
The figure above describe the structure of the diagnosis area for the lung cancer. In this case, the patient, when positive, may present Sarcoidosis, Lymphadenitis or Lung Carcinoma (and its possible subclasses). In the case of Lung Carcinoma or any of its subclasses, there may be additional information regarding the presence of necrosis or metastasis.
For the diagnosis of the celiac disease, as described in the image above, the outcome is fairly simpler that the one of the other three diseases: if the patient is posivite, s/he can either have the celiac disease or Duodenitis (an inflammation which does not necessarily imply celiac disease). The outcome may be correlated with the information about the Immunohistochemical Test carried on the patient.
In this case, a positive outcome can be enriched with information about the presence of immune cells such as granulocytes or lymphocytes, the presence of intestinal abnormalities such as edema or intestinal fibrosis, or data about the villi status, such as their atrophy degree or length. All components modeling such information are grouped in the "Annotations to the Case" area.
As shown above, the Examode Ontology comprises three other semantic areas, here presented for the case of the colon cancer, i.e. the Procedure and the Anatomical Location. The former details the surgical procedure performed to collect the tissue; the latter represents an anatomical location, that can be both the area from where the tissue was taken, or where the disease is located in the patient. We also modeled the Test area, which comprises other test performed.
In the case of the colon, the performed procedure can only be a surgical procedure which, in turn, can be a resection, an anastomosis, a hemiectomy or a form of Endoscopic Biopsy.
The contemplated anatomical location are the different areas of the colon. The class Colon, NOS (Not Otherwise Specified) describes a general area of the colon, and can be better specified by one of its subclasses. Other locations included in this area that are not in the Colon are the Rectum, the Abdomen and the Ileum.
In this case, the possible procedures include the cervical biopsy, an hysterectomy (and its specifications), and different types of surgical procedures, including conization and endocervical curettage.
The locations where the cancer can be located and the tissues be withdrawn include the uterus, the cervix epitelium and its different parts, and the cervical mucus.
In the case of the lung cancer, the possible procedures include different types of Biopsies. It is also possible to perform different types of immunohistochemical tests, that can usually return a true/false result or a numerical one, in the case of the test on the proliferation marker protein Ki-67.
The locations for the lung cancer include the whole lung, the bronchus and their parts, the mediastinum and the Thoracic lymph nodes.
In the case of the celiac disease the surgical procedure can be a form of biopsy, performed in different locations (a biopsy in the greater curvature, a biopsy in the pyloric antrum and one in the duodenum). As such, these cases are naturally connected to their locations.
The anatomical locations for the celiac disease include the Duodenum and its different parts, where the disease is located. Since the biopsy that allows to diagnose the disease can also be performed in the pyloric antrum and the greater curvature, they are included among the locations.
Thesauri, classification schemes, subject heading lists, taxonomies, 'folksonomies', and other types of controlled vocabulary are all examples of concept schemes. Concept schemes are also embedded in glossaries and terminologies.
has characteristics: symmetric, transitive
has characteristics: functional
has characteristics: functional
Acronyms, abbreviations, spelling variants, and irregular plural/singular forms may be included among the alternative labels for a concept. Mis-spelled terms are normally included as hidden labels (see skos:hiddenLabel).
[Serra et al., 2019] Serra, L.M., Duncan, W.D., Diehl, A.D.: An ontology for representing hematologic malignancies: the cancer cell ontology. BMC Bioinform. 20-S(5), 231–236 (2019).
[Gurcan et al., 2017] Gurcan, M.N., Tomaszewski, J., Overton, J.A., Doyle, S., Ruttenberg, A., Smith, B.: Developing the quantitative histopathology image ontology (qhio): a case study using the hot spot detection problem. Journal of biomedical informatics 66, 129–135 (2017).
[An et al., 2005] H.J. An, K.R. Kim, I.S. Kim, D.W. Kim, M.H. Park, I.A. Park, K.S. Suh, E.J. Seo, S.H. Sung, J.H. Sohn, H.K. Yoon, E.D. Chang, H.I. Cho, J.Y. Han, S.R. Hong and Ahn GH (2005). Prevalence of human papillomavirus DNA in various histological subtypes of cervical adenocarcinoma: a population-based study. Mod Pathol 18(4):528-534.
[Kurman et al., 2014] R.J. Kurman, M.L. Carcangiu, C.S. Herrington and R.H. Young (2014). WHO classification of tumours of the female reproductive organs. IARC press, Lyon.
[Lohi et al., 2007] S. Lohi, K. Mustalahti, K. Kaukinen, K. Laurila, P. Collin, H. Rissanen, et al. Increasing prevalence of celiac disease over time. Aliment Pharmacol Ther 2007;26(9):1217–25.
[Marsh et al., 1992] M.N. Marsh. Gluten, major histocompatibility complex, and the small intestine. A molecular and immunobiologic approach to the spectrum of gluten sensitivity (“celiac sprue”). Gastroenterology 1992;102(1):330–54.
[Fasano et al., 2003] A. Fasano, I. Berti, T. Gerarduzzi, T. Not, R.B. Colletti, S. Drago, et al. Prevalence of celiac disease in at-risk and not-at-risk groups in the United States: a large multicenter study. Arch Intern Med 2003;163(3):286–92.
[Fasano et al., 2001] A. Fasano, C. Catassi. Current approaches to diagnosis and treatment of celiac disease: an evolving spectrum. Gastroenterology 2001;120(3):636–51.
[Zauber et al., 2011] A.G. Zauber, S.J. Winawer, M.J. O’Brien, et al. Colonoscopic polypectomy and long-term prevention of colorectalcancer deaths. 2011. N Engl J Med 2012;366:687-96.
[Butte et al., 2012] J.M. Butte, P. Tang, M. Gonen, et al. Rate of residual disease after complete endoscopic resection of malignant colonic polyp. Dis Colon Rectum 2012;55:122-7.
[Lax, 2011] S. Lax. Histopathology of cervical precursor lesions and cancer. Acta Dermatovenerol Alp Pannonica Adriat. 2011 Sep;20(3):125-33.
[Travis et al., 2011] W.D. Travis, E. Brambilla and M. Noguchi et al. International Association for the Study of Lung cancer/American Thoracic Society/European Respiratory Society international multidisciplinary classification of lung adenocarcinoma. J Thorac Oncol 2011, 6:244-285.
[GIPAD, 2011] celiac Disease: The Histology Report (2011) On behalf of the “Gruppo Italiano Patologi Apparato Digerente (GIPAD)” and of the “Società Italiana di Anatomia Patologica e Citopatologia Diagnostica”/International Academy of Pathology, Italian division (SIAPEC/IAP)
[Duraiyan et al., 2012] Duraiyan, J., Govindarajan, R., Kaliyappan, K., Palanisamy, M.: Applications of immunohistochemistry. Journal of pharmacy & bioallied sciences 4(Suppl 2), 307 (2012).
The authors would like to thank Silvio Peroni for developing LODE, a Live OWL Documentation Environment, which is used for representing the Cross Referencing Section of this document and Daniel Garijo for developing Widoco, the program used to create the template used in this documentation.