This is the help document for SemGen, software for semantics-based composition and decomposition of biosimulation models.
We welcome feedback from the user community in order to improve SemGen's utility.
Please send feedback and bug reports to:
Maxwell Neal: maxneal[at]gmail[dot]com
modular modeling framework. This is why I and my colleagues at the University of Washington created the SemSim model description format, which provides a modular framework to model composition and decomposition. Soon thereafter I created SemGen as part of my dissertation work because we needed a software tool to help automate the creation, annotation, composition and decomposition of SemSim models.
In a nutshell, SemSim is a semantics-based approach to modular modeling. If you're interested in the gory details of the SemSim approach, you can find them in my dissertation, and the publications of the Semantics of Biological Processes group. In my experience, modelers do not create models with the intention of integrating them with models developed by outside researchers. Furthermore, to do so requires an understanding of how the model will be repurposed once disseminated among the modeling community. As it is not feasible to create a model that anticipates these diverse purposes, it becomes difficult to specify how a model should interface with others in a broad sense. Therefore, to automate model integration, a computer needs to recognize how two models should interface with each other. For biological modeling, this task requires identifying where the models describe the same biology. In other words, the computer needs to know where the models are semantically equivalent, because those are the biologically-meaningful points of coupling between the models - i.e. the interface. The SemSim format provides the rich semantics needed to capture a model's biological contents in a machine-readable way, and SemGen provides the tools for adding these semantic annotations to SemSim models. Semantically enriched models offer more opportunities to automate model composition and decomposition, and SemGen was designed to take advantage of these opportunities.
BioModels, the CellML model repository, and the physiome.org model repository) and convert them into interoperable SemSim models. Using SemGen, modelers will be able to automate, as much as possible, the modular composition and decomposition of SemSim models without the need for manual coding. Integrated and extracted SemSim models will in turn be added to a model repository for dissemination and reuse. Eventually there will be a number of encoding tools within SemGen that will translate SemSim models into a variety of simulation languages so that modelers can generate executable simulation code in their language of choice.
SemSim also differs from the CellML approach to model sharing and modularity in that SemSim models do not dilneate their internal components into a single set of specific sub-components. Instead, by leveraging the rich semantic annotations in SemSim models, modelers can use SemGen's Extractor tool to decompose a SemSim model in a number of different ways. This enables modelers to "carve out" the exact parts of a model they want to extract, without being constrained by a pre-coordinated decomposition.
java run time environment version 1.7 (64-bit) or higher to execute.
To check your java version, go to a command prompt and enter:
JSim version 2.05
Double-click the "SemGen.exe" file in the main SemGen directory.
Double-click the "SemGen.app" file in the main SemGen directory.
Double-click the "SemGen.jar" file in the main SemGen directory.
Once you have a model loaded into the Annotator, you can specfiy composite, singular and human-readable annotations for the model codewords. Composite annotations follow a specific annotation grammar: they are comprised of a physical property (e.g. molar amount), and either the physical entity (e.g. glucose) or physical process (e.g. glucokinase activity) that possesses the property. See the section on composite annotations below and Gennari et al. 2010, for more details. Singular annotations are comprised of one concept from a reference ontology, and should only be used when the composite annotation approach is insufficient to define the model codeword ("Heart Rate", from the NCI Thesaurus, for example). Human-readable definitions are straight text, natural language descriptions of model terms and do not include references to ontological concepts.
The image below shows the main Annotator interface for a hemodynamics model in SemGen v1.0. The codeword "VLV" has been annotated with composite and human-readable annotations.
When you edit a composite annotation for a model codeword, the Annotator provides an interface for rapid searching and retrieval of reference ontology concepts via the BioPortal webservice.
Example: Suppose you are annotating a beta cell glycolysis model that includes a codeword representing glucose concentration in the cytosol of the cell.
A detailed composite annotation would be:
OPB:Chemical concentration <propertyOf> CHEBI:glucose <part_of> FMA:Portion of cytosol <part_of> FMA:Beta cell
In this case we use the term Chemical concentration from the OPB for the physical property part of the annotation, and we compose the physical entity part by linking four concepts - one from the OPB, one from ChEBI and two from the FMA. This example illustrates the post-coordinated nature of the SemSim approach to annotation and how it provides high expressivity for annotating model terms.
The above example represents a very detailed composite annotation, however, such detail may not be necessary to disambiguate concepts in a given model. For example, there may not be any other portions of glucose within the model apart from that in the cytosol. In this case, one could use the first three terms in the composite annotation and still disambiguate the model codeword from the rest of the model's contents:
OPB:Chemical concentration <propertyOf> CHEBI:glucose
Although this annotation approach does not fully capture the biophysical meaning of the model codeword, SemGen is more likely to find semantic overlap between models if they use this shallower annotation style. This is mainly because the SemGen Merger tool currently only recognizes semantic equivalencies; it does not identify semantically similar terms in models that a user wants to integrate. Therefore, if a user wants to integrate our example glycolysis model with a TCA cycle model based on cardiac myocyte metabolism, the shallower approach would likely identify more semantic equivalencies than the more detailed approach.
Nonetheless, we recommend using the more detailed approach, given that future versions of SemGen will include a "Merging Wizard" that will identify and rank codewords that are semantically similar, not just semantically identical.
The five extraction methods are represented by the boxes along the left side of the interface: extraction by physical entity, extraction by codeword, and extraction by cluster. They can be used independently or in concert.
Extraction by physical process (SemGen 1.3 and higher): This semantics-based extraction method allows a user to specify which physical processes they want to preserve in the extracted model. In the "Processes" box, select those physical processes in the model you want to preserve and SemGen will retain the codewords annotated with those physical entities in the extracted code. The "include participants" option adds all codewords annotated against the physical entities that participate in the process. In the extracted model, all these preserved codewords are retained along with the input codewords needed for their computation.
Extraction by physical entity: This semantics-based extraction method allows a user to specify which physical entites they want to preserve in the extracted model. In the "Physical entities to extract" box, select those physical entities in the model that you want to preserve, and SemGen will preserve the codewords annotated against those physical entities in the extracted code.
Extraction by sub-model (SemGen 1.3 and higher): SemSim sub-models represent groups of codewords. This extraction method allows a user to specify which sub-models they want to extract. In the "Sub-models" box, select those sub-models in the model that you want to retain, and SemGen will preserve the codewords associated with that sub-model in the extracted code.
Extraction by codeword: This extraction method is included so that users have a maximal level of customization over their extracted model. SemGen preserves any codeword selected in this list in the extracted sub-model, along with the direct inputs needed for its computation. Select the "Include full dependency chain" option to preserve the complete computational dependency map needed to compute a selected codeword. Instead of turning the direct inputs to a selected codeword's computation into user-defined parameters, this option retains the full computational network needed to compute the codeword.
Extraction by computational network cluster: This third extraction method uses a network clustering algorithm to identify the modular portions of a SemSim model's computational network. By identifying the more closely-related computational clusters within a model, this extraction method provides a tool for users who might be unsure about how to best to decompose a model into modular components. Clicking the "Cluster" button in the "Clusters to extract" box will open the cluster identification tool. Raising the slider bar on the cluster identifier iteratively removes edges in the computational network graph of the SemSim model and in turn, delineates higher and higher numbers of computational modules within the model. Sometimes these computational modules will reflect the semantic architecture of the models, and so SemGen also includes a list of physical entities associated with each identified module in a separate window off to the left.
The screenshot below shows a physical entity-based extraction of the left ventricle dynamics from its parent cardiovascular model. Green coloring in the computational network map indicates codewords that are dependent variables in the source model, but will be converted into user-defined inputs in the extracted code.
The screenshot below shows the cluster identification interface. In this example, SemGen decomposed the cardiovascular dynamics model mentioned above into three clusters that delineate the heart, pulmonary circulation and systemic circulation components.
The screenshot below shows an integration between a cardiovascular dynamics and a baroreceptor model. The semantic mappings in the center represent the potential interface points between the models. These are the resolution points that SemGen automatically identifies - for each biological concept listed, the user chooses which computational representation of the concept they wish to preserve in the merged model. The user can also introduce manual mappings by selecting individual codewords from the bottom panels and pressing the "Add manual mapping" button. The mapping is then added as a resolution point in the center panel.
This tool is completely automated: select the SemSim model you want to encode and where you want to save the encoded model, and the Coder will translate the SemSim model into the format you select.
Copyright (c) 2010-2016 Maxwell Neal, University of Washington.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
for COMMERCIAL PURPOSES IS PROHIBITED without prior written permission from the author.
Redistribution and use in source and binary forms, with or without modification, are
permitted for non-commercial purposes (such as for research, personal use, or educational use),
provided that redistribution in any form includes this entire notice in all copies of the
software, derivative works, and supporting documentation.
The name of the author or the University of Washington may not be used to endorse or
promote software or services derived from this software without prior written permission
from the author and/or the University.
THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING,
BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL MAXWELL NEAL, UNIVERSITY OF WASHINGTON, OR ANY
CONTRIBUTORS TO THIS SOFTWARE BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.