https://doi.org/10.31449/inf.v47i7.4739 Informatica 47 (2023) 11–22 11 Derivation of Optimized Threshold of Semantic Alignment Metrics for Interpretation of Interoperability and Reusability of Cross- enterprise Vehicle Service Interface Models Sangita De 1,2 , Přemek Brada 2 , Juergen Mottok 3 1 Volkswagen AG, Ingolstadt, Germany 2 Department of Computer Science and Engineering, University of West Bohemia, Pilsen, Czech Republic 3 Faculty of Electrical Engineering and Information Technology, Ostbayerische Technische Hochschule (OTH), Regensburg, Germany E-mail: sangita.de@outlook.de, sangita.de@cariad.technology, brada@kiv.zcu.cz, juergen.mottok@oth-regensburg.de Keywords: interface, automotive services, semantic, ontologies, metamodels, framework, optimized threshold, metrics Received: March 13, 2023 Over the past decade, cars have turned gradually into real cyber-physical systems. The collaboration of services between the service-oriented, cross-enterprise vehicle application frameworks has increased to generate novel, smart and complicated vehicle services. Consequently, from an interoperability perspective, semantically mapping of vehicle service components' interface ontological models emerged as a significant research interest in the automotive application domain that manipulates several cross- enterprise synergy knowledge application frameworks. Also, several semantic quality metrics have been defined over time for the vehicle service interface ontological models. The empirically evaluated values of these metrics can be used to assess progress in cross-enterprise interoperability between the service and the clients' APIs ontological models in the vehicle domain. However, despite the potential benefits of semantic alignment quality metrics, the effective use of these metrics for vehicle service interface ontologies has proven elusive. Yes, such metrics can be used successfully for quantification, but they mostly fail to provide adequate annotations in subsequent decision-making in semantic interoperability and reusability. In fact, the absence of an effective and meaningful threshold for the semantic similarity measure between various vehicle service interface ontological models motivates this research to propose a novel design approach to an optimized threshold derivation for the semantic similarity metrics. This threshold is then applied to a set of defined semantic alignment metrics for vehicle service frameworks. This paper uses a real-world vehicle domain industrial case study to illustrate the design approach. Through the considered case study, this research highlights the significance of optimized semantic alignment metric thresholds in determining the degree of cross-enterprise semantic interoperability and reusability between heterogeneous vehicle service frameworks' interface ontological metamodels. Povzetek: V avtomobilski domeni je potreba po semantičnem preslikavanju ontoloških modelov vmesnikov komponent vozil narasla. Obstoječe metrike semantične kakovosti komponent vozila so izboljšane z vpeljavo optimiziranih pragov pri določanju semantične interoperabilnosti. 1 Introduction Applications in the automotive domain are implemented as multiple distributed service components, and those service components call each other's Application Program Interfaces (APIs) for the complete application to function. From a modeling perspective, to ensure semantic interoperability and meaningful data exchange between heterogeneous vehicle services' API models, it is substantial to link the framework-specific vehicle service's APIs data at the semantic level using a shared domain vocabulary. Domain-specific shared vocabulary motivates the use of ontologies [5]. Ontologies represent vehicle service Software Component (SWC) interface metamodels' specifications schematically. A critical characteristic of automotive service API ontologies is that they mainly change over time regarding structure and semantics. Changes in vehicle applications are often caused over time due to changes in requirements within the domain. Transition in the API ontologies can be due to one of the following scenarios [1]: ▪ Changes in the domain w.r.t time, cost, and requirements. ▪ Changes in the conceptualization result in changes in the construct or the structure of the ontology schemas. ▪ Changes in the explicit specifications include changes in property, attributes knowledge representation language, and the service interface version specification. In the transition mentioned above scenarios, it is substantial to preserve the quality of the service API ontological models w.r.t semantic representation of the domain concepts. Therefore, metrics are defined and 12 Informatica 47 (2023) 11–22 S. De et al. evaluated to empirically measure the semantic alignment quality between the various interoperating vehicle application frameworks' interface ontological models [5]. However, the quantified results of semantic quality metrics are insufficient to ensure better interoperability and reusability decision-making. Therefore, domain experts must determine meaningful thresholds for each semantic alignment metric [1]. However, these metrics thresholds should not be solely based on domain expert assumptions but also on analysis of the metric distribution datasets. Furthermore, the automotive industry is frequently evolving, so the automotive application domain is subject to frequent changes w.r.t concepts and requirements in the context of API models. Therefore, the derived metrics thresholds are not intended to be universally valid for all the semantic similarity metrics in all application contexts within the domain [3]. Nevertheless, for semantic interoperability, the derivation of the thresholds provides adequate annotations on the semantic similarity metrics variability between vehicle application frameworks. Furthermore, it helps focus on a reasonable percentage of semantically similar application framework-specific interface models [1]. 1.1 Contribution The semantic quality metrics can be used successfully to quantify interoperability; however, from a cost, effort, and time optimization perspective, it is also essential to elevate the reuse of vehicle service API ontological models in case of semantic synergies based on derived thresholds [2]. Based on the static analysis of semantic similarity metrics distributions for various vehicle application frameworks' interface ontological metamodels, thresholds for the metrics are derived using a proposed methodology. However, in the automotive domain, no sound methods and metrics currently can support the interpretation of the semantic similarity and particularity values to determine whether two interoperating vehicle application framework services' interfaces are semantically similar [3]. Interpretations in such cases are frequently based on an implicit threshold or an arbitrary value determined by domain experts based on application context [3]. With several requirements in mind to avoid the problems associated with the earlier thresholds earlier approaches in other knowledge domains, this research proposes a methodology to determine the thresholds for a defined set of semantic alignment quality metrics (including semantic cohesion and coupling metrics). These semantic alignment metrics are defined for vehicle application frameworks' interface ontological models [6][1]. The proposed methodology basically adheres to the following points [1]: ▪ The methodology is not recommended to be driven or assumed only by domain expert opinion but also by static analysis of metric distribution datasets. ▪ The methodology should respect the metric scale and the distribution ranges. The methodology should also be flexible against deviations in metrics values, service interface versions, and the automotive cooperative application complexity. ▪ In the frequently changing vehicle application domain, the methodology to derive and optimize the threshold value for semantic quality metrics should be robust enough to be repeatable, applicable, transparent, and pragmatic when applied to a wide range of semantic quality metrics [1][16]. This research considers a real-world typical vehicle domain case study to study the derived threshold's stability. The proposed methodology is further applied on a set of defined, pre-evaluated semantic alignment quality metrics distribution datasets [2] for the vehicle application frameworks' interface models that are part of the case study. The semantic alignment quality metrics defined in this paper are pre-evaluated manually in terms of percentages in literature work [2]. OWL2 (Web Ontology Language version 2.0) is used as a metamodeling language to describe the vehicle services' interface ontological metamodels [1]. References of Related Work used in this Subsection Thresholds derived from expert experiences Thresholds derived from metric distribution analysis Thresholds derived from Error models analysis Thresholds for semantic quality metrics in automotive domain Thresholds as a measure to indicate degree of interoperability & reusability [10], [5]  [16],[4], [6],[7],[9],[6],[1],[8]    [11],[12]  Author’s Contributions to the State of the Art Current research work, [2], [5]     Table 1: Summary table of related research works. Derivation of Optimized Threshold of Semantic Alignment Metrics… Informatica 47 (2023) 11–22 13 1.2 State of the art This subsection briefly overviews previous research attempts to define metric thresholds, as illustrated in Table 1 [1]. Discussions. Various research works from different knowledge domains have defined metric thresholds based on researchers' experiences. For example, for the McCabe metric, the threshold value was defined as 10 [10], and for the NPATH metric, the threshold value was defined as 200[5]. However, the threshold values derived from experience lack adequate scientific evidence to be reproduced or generalized. This research proposes a threshold derivation methodology that combines vehicle domain experts' experiences with real-world scientific assumptions and evidence, making the method robust and flexible to be refactored and generalized. Research works also propose methodologies to derive optimum thresholds based on quality metric models distribution data analysis. For example, Erni and French et al. [16][4] proposed a multi-metrics methodology and use of mean (µ) and standard deviation (σ) methods to derive a threshold T based on object-oriented data[4]. The threshold T was evaluated as T = µ + σ or T = µ − σ where the high or the low metric values indicated potential problems, respectively. The disadvantage of this methodology is the assumption of normally distributed metrics or data normality. The assumption of only normalized metric distribution methodology does not consider open-world assumptions like, for example, the structural evolution of objects over time that might result in deviation in metric values, limiting the usage of this methodology. In fact, the methodology described in this work addresses open-world assumption w.r.t semantic alignment quality metrics for vehicle service API models. The research work [7] defines absolute and relative thresholds for filtering metric data sets of values [6]. Statistics-based thresholds are derived from statistical analysis of metric values from a population sample. Much research has also proposed power laws as the preferred methodology for representing metric distributions in graphs to analyze relationships between classes and objects in an object-oriented system [9][6]. Similarly, Baxter et al. [8] also concluded that some analyzed software metrics dataset values follow a power law distribution. The study proposed that in-degree subclasses are a power law distribution and out-degree fields are not power law distributions. However, all these research works fall short of concluding how to use these complex distributions and the coefficients of these distributions to establish baseline values to judge systems. In contrast, this research work is focused on defining thresholds with direct applicability to differentiate framework-specific API models, judge semantic quality, and pinpoint problems. The thresholds for quality metrics can also be derived using error model analysis. Shatnawi et al. [11] proposed using Receiver-Operating Characteristic (ROC) methodology to explore thresholds to predict the bugs in different error categories. However, there are two significant drawbacks to their derived results. Firstly, the method to derive the threshold is not monotonic. Secondly, for every different release of Eclipse, different thresholds were derived, which implies weak stability of the derived threshold. The studies of Benlarbi et al. [12]show no empirical evidence for a defined threshold model that can be used to predict faults or errors. However, these results apply only to the specific error prediction model that the authors have used. In contrast to these works, this research paper's proposed monotonic methodology for the semantic similarity metric's threshold derivation ensures stability and flexibility against changes in vehicle service interface versions, SWCs' interface concepts, and the automotive cooperative system's complexity and sizes. 2 Design approach to optimum similarity threshold This section provides an overview of the proposed design methodology to derive an optimized threshold for semantic alignment metrics. An automotive domain real- world industrial case study has also been used to demonstrate the proposed methodology[5][2]. 2.1 Methodology to derive optimum similarity threshold The design methodology to derive the optimum threshold is based on the static analysis of semantic alignment quality metric distributions datasets and is composed of three fundamental steps [3][17]: 1. In the first step, define two groups of vehicle service API ontological metamodel sources: inter and intra- group. Then, within each intra- group, define pairs of APIs ontological metamodel sources. In each intra- group, a framework-specific vehicle service API ontology is paired and semantically compared with a platform-agnostic, vehicle domain-specific mediator interface ontological metamodel. The mediator ontology is a framework-independent, generic vehicle service API ontology source; hence, it is a more abstract ontological metamodel compared to the framework-specific API ontological metamodel sources specifications. Therefore, this further implies that the ontology source pairs within each intra-group share less or a few semantic commonalities between their API traits. However, due to more concrete specifications, the framework-specific API ontological sources from different intra-groups share more synergies in their API semantic concepts or traits (also called inter-group similarities) when compared to the semantic similarity within each intra-group. 2. In the second step, for the above-defined intra-group sources, compute the semantic similarities between each pair of APIs ontological metamodel sources (i.e., the intra-group similarities) using the defined semantic similarity metrics. Then, aggregate all the metrics' lowest probability semantic similarity results to obtain an IN distribution representing less similar API semantic traits. Additionally, compute the inter- 14 Informatica 47 (2023) 11–22 S. De et al. group semantic similarities between each combination of an API ontology source from the first intra-group and an API ontology source from the second intra-group using the same set of semantic similarity metrics. Also, aggregate all the highest probability semantic similarity results to obtain an IS distribution representing more similar API semantic traits [3]. 3. In the third step, compare the IS and IN similarity distributions. If, in the case, IS and IN distributions have no overlap between their acceptable data ranges (min, max) or interval, define the threshold τopt using any value between τSL (the lowest value of IS) and τNH (the highest value of IN). Else, in case of overlap in similarity distributions IS and IN, there are some false negatives (FND) and some false positives (FPD) data values in the distributions, that is [3][17]: a. Compute the proportion FND in the IS distribution for all samples of the similarity threshold between τNH to τSL. In this step, consider every value below the similarity threshold, τSL, as FND. b. Compute the proportion FPD in the IN distribution for all samples of the similarity threshold between τNH to τSL. Consider every value above the similarity threshold, τNH, in this step as an FPD. c. Compute the average sum of the FPD and FND proportions obtained in steps 3a and 3b. Each possible similarity threshold value, τopt, for each semantic alignment quality metric is considered at the point in the acceptable distribution range, where the average sum of FPD and FND proportions is observed to be minimum. Additionally, in the case where there is an overlap observed between the IS and IN metric distributions, there cannot be any FNDs below τSL, but there will be some FPDs above τNH as the distributions overlap. However, the derived values of the threshold by the given methodology are subject to dynamic changes based on the changes in the values of the semantic alignment quality metrics, which depend on the considered case study source's interface specifications[1][16]. The workflow model for the above-proposed methodology on threshold derivation is illustrated in Figure 1[3]. 2.2 Comparison of semantic similarity metric distributions for threshold derivation The design methodology for the optimum threshold is based on semantic similarity metrics distributions, IS and IN. Figure 2. illustrates a conceptual, ideal example or an usecase for the semantic similar metric distribution ranges and corresponding threshold derivation, where the minimum value of the S distribution, τSL, is greater than τNH the maximum value of IN distribution[3]. In this ideal case, the highest value of all IN distributions is τNH, the value above which the two compared genes are similar, and the lowest value of all IS distributions is τSL, the value under which the two compared genes are non-similar [3]. A semantic similarity metric value greater than τSL implies that the service API semantic traits are similar for a given inter-group ontology source pair and that the source pairs are interoperable. Similarly, a semantic similarity value lower than τNH means that for a given intra-group ontology source pairs, the API semantic traits are non-similar. A semantic similarity metric value between τSL and τNH acceptable data interval means that the API semantic traits for the source pairs (inter or intra) are nearly similar, and thus, in this case, any value in the acceptable dataset interval range can be selected as the threshold, or it might also require expert opinion to interpret the optimum threshold in case of special usecases. Figure 3. illustrates a conceptual, non-ideal usecase where the IS and IN distributions overlap, meaning that there are some FPDs (that is, the metric values for the intra-group pair of sources from the IN distribution that are non-similar, but that have a similarity value greater than τSL ) and FNDs (that is the metric values for the inter- group source pairs from IS distribution that are similar but have a similarity value lower than τNH ) [3]. In this case, a semantic similarity metric value lower than τSL means that the vehicle service API semantic traits for inter-group source pairs compared are non-similar. Figure 1: Workflow of optimum semantic similarity threshold derivation for metrics. Derivation of Optimized Threshold of Semantic Alignment Metrics… Informatica 47 (2023) 11–22 15 Figure 2: Ideal case for semantic similarity threshold. Figure 3: Non-Ideal case for semantic similarity threshold derivation. Additionally, in this case a similarity value greater than τNH means that the service API semantic traits for intra-group source pair compared are non-similar. Also, as implied from Figure 3., if τSL < τNH, then as τopt gets closer to τSL in the distribution acceptable dataset range, there will be more FNDs and fewer FPDs, and as τopt gets closer to τNH in the distribution’s acceptable dataset range, there will be more FPDs and fewer FNDs [3]. 2.3 An industrial case study This subsection describes a typical automotive industrial case study to demonstrate the proposed design approach on optimized threshold derivation for semantic similarity metrics is Keyless Vehicle Entry, as illustrated in Figure 4. In this case study, the owner of a car wants to give the vehicle access to someone just by using his mobile phone, and the owner of the car is geographically located far away from his car [5][2]. This case study involves service collaborations from third-party, cross- knowledge domain platforms such as Robotics, Telematics, Infotainment, Cloud, etc. To simplify the illustration of the given case study, we consider the three most used cross-enterprise vehicle application component frameworks that are used as service collaborators to realize this complex case study. They are namely, AUTOSAR Adaptive, Franca (from Genivi), and ROS2. To explore the cross-enterprise semantic alignment, the service component's interface models of the three frameworks mentioned above are modelled as ontological metamodels schemas for semantic mapping[13][15], as illustrated in Figure 5. To bridge the semantic gap between the heterogeneous vehicle service API's ontological metamodel sources of the given case study, a vehicle domain-specific, platform-agnostic, mediator API ontological metamodel, namely, DM, is used in the current scope[13][5][15]. The degree of semantic alignments based on TBox axioms (asserted and inferred) between the application framework-specific interface ontological metamodel's concepts, relations (e.g., is-a), and properties are empirically evaluated using metrics. Figure 4: An industrial automotive domain case study. 16 Informatica 47 (2023) 11–22 S. De et al. Figure 5: Abstract representation of case study using conceptual vehicle interface ontological metamodels. Additionally, the degree of semantic similarity based on ABox axioms (asserted and inferred) between the interface ontological metamodels' class instances are also empirically evaluated at the instance or knowledgebase level using semantic similarity metrics [2][5][13]. Regarding the considered case study, each framework-specific interface ontological metamodels of AUTOSAR Adaptive, Franca, and ROS2 application frameworks are firstly paired with the mediator interface ontology, DM, as intra-group source pairs. Later, each of the above framework-specific interface ontological metamodels is paired with each other as inter-group source pairs [2][5][3]. However, this case study also includes service contributions from service-providing SWCs of Android and MuleSoft (for Amazon Web Services) application frameworks, which are not considered in the current scope. 3 Application of methodology to semantic similarity metrics for determination of optimum threshold Based on an earlier literature work on the evaluation of semantic similarity metrics [2], this subsection briefly defines and provides an overview of a set of three semantic similarity quality metrics, namely SSS, IRR, CIC used for the evaluation of semantic alignment depth between interoperating vehicle application frameworks service API's ontological metamodels[13][14][17]. The proposed methodology on threshold derivation is also applied to these three given pre-evaluated semantic similarity quality metrics. The application of the methodology to derive a threshold can be dynamically adapted based on changes in application framework interface model concepts and platform specifics[1][2]. As mentioned in the earlier subsection 2.3, the framework-specific interface ontological metamodels as part of the case study are paired as inter and intra-group source pairs. That is, for the intra-group source pairs, Source 1 includes AUTOSAR Adaptive framework- specific SWC interface ontology, and the mediator ontology, DM, Source 2 includes Franca framework- specific SWC interface ontology and the mediator ontology, DM, Source 3 includes ROS2 framework- specific nodal interface ontology and the mediator ontology, DM [2]. Similarly, each framework-specific service API ontology within the scope of the case study is then paired with each other as inter-groups source pairs to identify the possible semantic similarities between each source pair[3]. That is, inter-group Source 1 includes AUTOSAR Adaptive framework-specific SWC interface ontology and the Franca framework-specific SWC interface ontology, Source 2 includes Franca framework-specific SWC interface ontology and the ROS2 framework-specific nodal interface ontology, Source 3 includes AUTOSAR Adaptive framework-specific SWC interface ontology and the ROS2 framework-specific nodal interface ontology[3][2]. The greater the number of inter and intra-group pairs of APIs ontological metamodel sources used for semantic similarity metrics evaluation and the corresponding threshold derivation, the greater is the stability of the derived threshold. Consequently, the reliability of using such a derived threshold as a benchmark is greater in defining semantic interoperability and reusability of the vehicle service API ontological metamodel sources [3][1]. Derivation of Optimized Threshold of Semantic Alignment Metrics… Informatica 47 (2023) 11–22 17 3.1 Derivation of optimized threshold for semantic similarity schema (SSS) metric Definition: For a given automotive SWC API ontology schema, say, Oi, the Semantic Similarity Schema (SSS) metric is represented as the percentage of the fraction of the number of semantically equivalent (is-a relationships) schema classes (EQV) that is equivalent to the classes of other SWC APIs ontology schema when semantically compared to the total number (TRC) of the schema classes in Oi. TRC may include inheritance classes (IHCL), noninheritance classes (INHCL), as well as EQV in the given schema Oi [2][5][14]. SSS = EQV / TRC (1) Where TRC = IHCL + INHCL + EQV in (1). With the proposed methodology on threshold derivation, the threshold for the Semantic Similarity metric, SSS can be derived based on the evaluated metric values for the different vehicle service API ontology source pairs of the given case study. Figure 6: Overview of IN distribution for SSS metric. For the intra-group source pairs with the mediator Ontology, DM, the evaluated values of the metric are represented in the respective IN distribution, following step 2 of the proposed methodology. As observed from Figure 6., for the intra-group sources, the maximum value of SSS metric in the IN distribution, τNH, is 26 on a scale of 100. Similarly, for the inter-group source pairs, the evaluated values of the SSS metric are represented in the respective IS distribution, also as seen in Figure 7. In Figure 7., it can also be observed that for the inter-group sources, the minimum value of SSS metric in the IS distribution τSL = 27. The SSS metric can be used as a cohesion interface semantic similarity metric[2][5]. As illustrated in Figure 6. and Figure 7. shows no overlap between the IS and the IN distributions for the SSS metric dataset distribution ranges (min, max). Therefore, as specified in step 3 of the proposed methodology, the optimum threshold, τopt, for the SSS metric can be considered as any value between τSL and τNH. Figure 7: Overview of is distribution for sss metric. Hence, based on the static analysis of the distribution acceptable dataset ranges, the center value between τSL and τNH is selected as the optimum threshold, that is τopt for SSS Metric for all the given inter and intra-group source pairs is selected as 26.5 [3] . 3.2 Derivation of optimized threshold for instance relationship richness metric Definition: For a given automotive SWC API ontology schema, say, Oi, the Instance Relationship Richness Metric (IRR) metric represents the depth of the knowledgebase[14] and is represented as the percentage of the fraction of the total number of sameAs instances (ISA) of schema classes in Oi compared to the total number of individuals of the given schema classes (TRInst) existing in Oi [2]. TRInst may include sameAs and differentFrom individuals (IDF) of the ontological schema classes. IRR is a cohesion semantic similarity metric [5]. IRR = ISA / TRInst (2) Where TRInst = ISA + IDF in (2). Likewise, the SSS metric for the intra-group sources (Source 1, Source 2, and Source 3), the evaluated values of the IRR metric for the ontology sources of the given case study are represented in the respective IN distribution with τNH =33.33 on a scale of 100, as seen in Figure 8. Also, for the inter-group sources, the evaluated values of the IRR metric are represented for the source pairs in the respective IS distribution, following step 2 of the proposed methodology[3][5]. As seen in Figure. 9, the minimum value of Is distribution for IRR metric, τSL=35 on a scale of 100. As seen in Figure 8. and Figure 9., the metric values for the source pairs (inter and intra-group) for the IRR metric IN-Distribution->Intra-group: ❑ Source 1 Ontology-> AR Adaptive interface + DM interface ❑ Source 2 Ontology-> Franca interface + DM interface ❑ Source 3 Ontology->ROS2 interface + DM interface DM -> Domain-specific Mediator Ontology IS-Distribution->Inter-group: ❑ Source 1 Ontology-> AR Adaptive interface + Franca interface ❑ Source 2 Ontology-> Franca interface + ROS2 interface ❑ Source 3 Ontology->ROS2 interface + AR interface 18 Informatica 47 (2023) 11–22 S. De et al. have an overlap between the maximum and the minimum values of the distributions, IS and IN. Figure 8: Overview of IN distribution and instance semantic similarity relation examples for IRR metric. Due to the overlap in IRR metric distributions, Is and IN, the optimum threshold, τopt, for the IRR metric can be derived at the point in the acceptable dataset range in the distributions that minimize the average sum of FNDs and FPDs proportions[3][17]. Figure 9: Overview of Is distribution and instance relationship examples for IRR metric. Therefore, as specified in steps 3a and 3b of the proposed methodology and as illustrated in Figure 10., for each possible value within the acceptable distribution dataset range of the IRR metric, the sum of the FPD and FND proportions is calculated [3]. In the FPD proportion, Fp, is calculated for all the IRR metric values in the respective IN distribution for the intra-group sources, where FPD is expressed as absolute floating numbers. Figure 10: Overview of overlap between Is and IN distributions of IRR metric. Similarly, in the FND proportion, Fn is also calculated for all the IRR metric values in the IS distribution for the inter-group sources and is expressed as absolute floating numbers, as illustrated in Table 2. The absolute (abs) value of τopt, for the IRR metric can also be derived by an equation as seen in equation (3). τopt (abs)= round ((float (Fn) + float (Fp)) / 2) (3) Table 2: Calculation of Fn and Fp from FND and FPD Proportions of Is and IN distributions. It is also important to understand here that if the selected τopt for IRR metric gets closer to τSL then there are more FNDs and fewer FPDs, and as τopt gets closer to τNH there are more FPDs and fewer FNDs [3]. Therefore, for the IRR metric based on the least average sum of Fn and Fp, the τopt is selected as 0.363, an absolute value for the threshold representing 36.3 % on a scale of 100, also highlighted in gray in Table 2 above. 3.3 Derivation of optimized threshold for class instance connectivity metric Definition: For a given automotive SWC API ontology schema, say, Oi, the Class Instance Connectivity (CIC) metric is represented as the percentage of the fraction of the number of sameAs individuals of semantically equivalence schema classes (EQC) existing Fn (absolute floating Value) Fp (absolute floating Value) τopt= round((Fn+Fp )/2) 0.35 0.375 0.36 0.36 0.365 0.363 0.365 0.37 0.37 0.355 0.355 0.355 Derivation of Optimized Threshold of Semantic Alignment Metrics… Informatica 47 (2023) 11–22 19 Figure 11: Overview of Is and IN distribution and class instance connectivity relationship examples for CIC metric. in Oi compared to the total number of semantically equivalent schema classes (TRQ) existing in Oi. The CIC metric fundamentally indicates the fraction of schema class equivalence relationships that are being utilized at the individual level of the schema [2][5][14]. CIC = EQC / TRQ (4) Where EQC ∝ (sameAs INDIVIDUAL (Oi)) in (4). Like the SSS and IRR metrics, the threshold for the CIC metric can be derived based on the above-specified methodology for the different service API ontological metamodel sources of the given case study[5][2]. For the intra-group source pairs [3], the evaluated values of the CIC metric for the source pairs with mediator Ontology, DM, were represented in the respective IN distribution, as seen in Figure 10. Also, for the inter-group sources [3], the evaluated values of the CIC metric were represented for the source pairs of the given case study w.r.t each other in the IS distribution, as illustrated in Figure 11. Figure 11 shows that for the intra-group sources, the maximum value of CIC metric in the IN distribution τNH = 62.5. Similarly, it can also be observed that for the intra- group sources[3], the minimum value of CIC metric in the IS distribution, τSL=66.67 on a scale of 100. Like IRR and SSS metrics, CIC metric can be used to measure interface semantic similarity in semantic cohesion and coupling scopes. Hence, as illustrated in the above Figure 11., for the CIC metric, there is no overlap between IS and the IN distribution data ranges (min, max). Therefore, following the step 3 of the proposed methodology[3][17], the optimum threshold, τopt, for the CIC Metric can be considered as any value between τSL and τNH, an ideal case for threshold derivation, as illustrated also in Figure 2. Hence, based on the static analysis of the IS and the IN distribution acceptable dataset ranges, the centre value in the acceptable dataset range that is between τSL and τNH is selected as the optimum threshold. The τopt for CIC metric for all the given sources (inter and intra-group) is selected as 65. 3.4 Results and discussions To understand better the effect of the derived optimum semantic similarity threshold on reusability and interoperability between the vehicle service API ontological metamodel sources, this subsection provides comparative analysis of the results. For the SSS metric, based on manual static analysis of metric values distributions, the derived optimum threshold is τopt = 26.5, which further means that the respective metric values for all the inter-group sources, namely, Source 1, Source 2 and Source 3 of the given case study are above the derived τopt value[1][3]. The derived threshold value for the SSS metric implies that all the given inter-group sources are semantically similar at ontology schema level. Moreover, in the cases of semantic synergies in their service interface concepts, properties and attributes, the interface ontological metamodel source pairs can be reused [16] instead of each other in processes like semantic integration for developing complex automotive applications. This would reduce cost, effort and avoid redundancy of data. 20 Informatica 47 (2023) 11–22 S. De et al. For the intra-group source pairs, based on the derived optimum threshold value for SSS metric, it can be considered that except for Source 1, the other two sources, namely Source 2 and Source 3, are not semantically interoperable with each other at the ontology schema level. Based on manual static analysis of metric distributions, namely, IS and IN, the optimum threshold, τopt, for the IRR metric is derived as 36.3, which means that the respective metric value for the intra-group sources, namely, Source 3, is above the derived threshold, τopt value and for Source 1 and Source 2, the metric values are below the derived threshold value[1][3][16]. This further implies that the Source 3 interface ontological metamodel source pairs within the intra-group Source 3 are semantically similar to each other, whereas the interface ontological metamodel source pair within the intra-group Sources 2 and 1 are not semantically similar to each other. Due to an overlap in the IS and the IN distributions data ranges (min, max) of the IRR metric, for the inter- group sources, namely, Source 2 and Source 3, the metric values are above the derived threshold, τopt value whereas, for the Source 1 the metric value is below the derived threshold value. This further implies that the interface ontological metamodel source pairs within the inter-group Sources 2 and 3 are semantically similar and interoperable with each other and can be reused for each other in processes like semantic integrations for developing automotive applications[16][14][13]. The optimum threshold, τopt, for the CIC metric based on static analysis of metric distributions is derived as 65, which means that the respective metric values for all the inter-group sources, namely, Source 1, Source 2, and Source 3, are above the derived threshold, τopt, value. This further implies all the interface ontological metamodel source pairs within each of the given inter- group source pairs have strong semantic similarity between their conceptual classes at the schematic level and semantic similarity between their individuals at the instance level when compared with each other[3]. No overlap was observed between the IS and the IN metric distribution data ranges (min, max) for the CIC metric. Hence, for all the given intra-group sources, the metric values are below the derived threshold, τopt, a value that implies all the interface ontological metamodel source pairs within each of the given intra-group sources have poor semantic similarity between their conceptual classes and individuals at ontology schema and instance levels when compared with each other. Limitations. The threshold determination for a semantic similarity measure is limited by the number of semantic similarity metrics, vehicle application frameworks, and corresponding annotations that are available to study the threshold stability. In fact, there can be several possible variations in the quantity, granularity, and reliability between different semantic similarity metrics distributions. Consequently, it becomes difficult to determine a stable and universal threshold for all the semantic similarity quality metrics within the vehicle application domain [3]. Therefore, although the thresholds that are derived in this research work are application agnostic, however, in an ever-evolving vehicle domain, it is preferable to recompute the threshold on a new set of semantic similarity metrics based on the application context and annotations that are available[1][6]. Also, with the proposed methodology, the appropriate choice of "IS" and "IN" source pair groups and the corresponding distributions' analysis, which is crucial in the threshold determination process, requires some degree of domain expertise. In the absence of a proper degree of knowledge in the vehicle domain, it is difficult to interpret meaningful information using the threshold. Hence, in such cases, the thresholds are not accurate for semantic similarity measures[3][1]. 4 Conclusion The concept of semantic interoperability in automotive software engineering is notably elusive. This research proposes a methodology for determining a threshold for interpreting meaningful information based on the dataset values in the semantic alignment metric distributions. To study the stability of the derived optimized threshold, the proposed methodology was applied to a set of pre-evaluated interface semantic alignment cohesion and coupling metrics that are defined for the vehicle application framework's interface ontological metamodels[2][5]. The interface semantic alignment metric considered in this research work has been empirically evaluated in earlier literature [2] to measure the semantic alignment quality between vehicle application framework interface ontological metamodels. This work mainly focuses on the derivation of the threshold for the given set of semantic similarity metrics applicable to ontological metamodels at the instance and schema levels. This work derives a semantic similarity threshold for metrics based on comparing distributions of semantic similarity values of pairs of service interface ontological metamodels and non- similar service interface ontological metamodels. A real- world vehicle domain case study was considered in the research scope to demonstrate the proposed methodology on threshold derivation[2][5]. The case study also reveals the impact of the derived threshold on the vehicle services' semantic interoperability and reusability. Due to frequently evolving vehicle domain applications' SWCs requirements, corresponding changes in interface ontological metamodel concepts and semantic alignment metrics evaluation methods and metric distributions, it can be claimed that although the proposed methodology on threshold derivation for semantic similarity measure is monotonic, however, the derived threshold values are not universal and are subject to changes in the future. In fact, in such cases, before using the proposed methodology, every service user should first check whether the original metric values distribution of the given semantic alignment metrics is still relevant in their own application's context or not [3]. This work derives the threshold for all the given sets of semantic similarity metrics as a percentage on a scale of 100. The values of the derived thresholds vary in each semantic similarity metric context. With this work, we Derivation of Optimized Threshold of Semantic Alignment Metrics… Informatica 47 (2023) 11–22 21 also propose even an optimum threshold derivation in the worst-case scenarios by minimizing the proportions of false-positive and false-negative similarity matches when there is an overlap in similar and non-similar distributions for a given interface semantic similarity metric. From a time, cost, and effort estimation perspective, the results of this work can help automotive software engineers evaluate and interpret vehicle services' interfaces' semantic interoperability and reusability in a more meaningful way through meaningful information interpretation. Nevertheless, as a future work avenue, it would be interesting to apply the proposed methodology on threshold derivation to suites of semantic similarity metrics proposed by other vehicle domain experts or to metrics for other types of software models used in the vehicle domain for semantic similarity measure, for example, ecore models, etc. References [1] T. L. Alves, C. Ypma and J. Visser, "Deriving metric thresholds from benchmark data," 2010 IEEE International Conference on Software Maintenance, Timisoara, Romania, 2010, pp. 1-10, doi: 10.1109/ICSM.2010.5609747. [2] S. De, J. Mottok, P. Brada (2023), "Evaluation of Semantic Interoperability of Automotive Service API Models Based on Metamodels Similarity Metrics Using a Semi-automated Approach," In: Miraz, M.H., Southall, G., Ali, M., Ware, A. (eds) Emerging Technologies in Computing. iCETiC 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 463. Springer, Cham, https://doi.org/10.1007/978-3-031-25161-0_2. [3] C. Bettembourg, C. Diot, O. Dameron, "Optimal Threshold Determination for Interpreting Semantic Similarity and Particularity: Application to the Comparison of Gene Sets and Metabolic Pathways Using GO and ChEBI." PLOS ONE 10(7): e0133579, 2015, https://doi.org/10.1371/journal.pone.0133579. [4] V. A. French, "Establishing software metric thresholds," International Workshop on Software Measurement (IWSM'99), 1999. [5] S. De, P. Brada, J. Mottok & M. Niklas, (2021), "The Empirical Evaluation of Semantic Alignment Quality Metrics for Vehicle Domain Component Frameworks Interface Ontologies," In: The International FLAIRS Conference Proceedings, 34. https://doi.org/10.32473/flairs.v34i1.128512. [6] K. A. Ferreira, M. Bigonha, R. D. Bigonha, L.F. Mendes & H.C. Almeida (2012). Identifying thresholds for object-oriented software metrics, In: The Journal of Systems and Software, 85, 244-257. [7] M. Lanza, R. Marinescu (2006). Object-Oriented Metrics in Practice: Using Software Metrics to Characterize, Evaluate, and Improve the Design of Object-Oriented Systems. Springer-Verlag, Germany. [8] Baxter, G., Frean, M., Noble, J., Rickerby, M., Smith, H.,Visser, M., Melton, H., Tempero, E., 2006. Understanding the shape of java software. In: OOPSLA'06, OR, Portland, USA. [9] R. Wheeldon, & S. Counsell, (2003), "Power law distributions in class relationships", In: Proceedings Third IEEE International Workshop on Source Code Analysis and Manipulation, pp. 45-54. [10] T. J. McCabe, "A Complexity Measure," in IEEE Transactions on Software Engineering, vol. SE-2, no. 4, pp. 308-320, Dec. 1976, doi: 10.1109/TSE.1976.233837. [11] R. Shatnawi, W. Li, J. Swain, and T. Newman, "Finding software metrics threshold values using ROC curves," In: Journal of Software Maintenance and Evolution: Research and Practice, 2009. [12] S. Benlarbi, K. E. Emam, N. Goel, and S. Rai, "Thresholds for object-oriented measures," in ISSRE '00: Proc. of the 11th International Symposium on Software Reliability Engineering. IEEE, 2000, p. 24. [13] L. Yonglin, Z. Zhi, L. Qun, "An ontological metamodeling framework for semantic simulation model engineering", Journal of Systems Engineering and Electronics, vol. 31, no. 3, pp. 527-538 (2020). [14] S. Tartir, I. B. Arpinar, and A. Sheth, (2010), "Ontological Evaluation and Validation", In: book: Theory and Applications of Ontology: Computer Applications: 115-130. [15] T. Poulain, N. Cullot, K. Yétongnon, “Ontology mapping specification in description logics for cooperative systems”, In: Journal des sciences pour l’ingénieur (JSPI), vol. 7, pp.64-71. hal-00722532f (2006). [16] K. Erni and C. Lewerentz, "Applying design-metrics to object-oriented frameworks," In: METRICS '96: Proceedings of the 3rd International Symposium on Software Metrics. Washington, DC, USA: IEEE Computer Society, 1996, p. 64. [17] J.L. Sevilla, V. Segura, A. Podhorski, E. Guruceaga, J.M. Mato, L.A. Martinez-Cruz, "Correlation between gene expression and GO semantic similarity", In: IEEE/ACM Trans Computational Biology and Bioinformatics. 2005; 2(4):330–8. doi: 10.1109/TCBB.2005.50 PMID: 17044170. 22 Informatica 47 (2023) 11–22 S. De et al.