Projects‎ > ‎SemGen‎ > ‎SemGen help‎ > ‎

SemGen 3.0 help


SemGen is a java-based program and requires java run time environment version 1.7 (64-bit) or higher to execute.
To check your java version, go to a command prompt and enter:
java -version

Annotator requirements

For SemGen version 1.3 and below, JSim must be installed on your system to convert existing MML, SBML and CellML models into the SemSim format. We recommend using version 2.05 with SemGen version 1.3. We recommend JSim version 1.6.93 for prior SemGen releases. An independent JSim installation is not required for SemGen versions above 1.3.

JSim version 2.05

JSim version 1.6.93

When you first attempt to annotate an MML model, you may be prompted to find the 'jsbatch' command line program that comes with the JSim software. This tool is necessary for converting existing models into the SemSim format.

Typical locations of the jsbatch program:
    C:\...[JSim home directory]\win32\bin\jsbatch.bat
    /...[JSim home directory]/macos/bin/jsbatch
    /...[JSim home directory]/linux_i386/bin/jsbatch
Once SemGen prompts you to locate jsbatch, find it within the JSim home directory. SemGen will remember its location and you will not need to locate the file in the future.

Getting Started

    Double-click the "SemGen.exe" file in the main SemGen directory.

    Double-click the "" file in the main SemGen directory.

    Double-click the "SemGen.jar" file in the main SemGen directory.


With the Annotator tool, you can convert mathematical models into the SemSim format and annotate the model's codewords using concepts from online reference ontologies. Currently the Annotator can convert MML, SBML, and CellML models into the SemSim format, as long as the JSim simulation engine can compile them. The Semantics of Biological Processes group maintains a protocol for annotating a model which can help guide the annotation process.

Once you have a model loaded into the Annotator, you can specfiy composite, singular and human-readable annotations for the model codewords. Composite annotations follow a specific annotation grammar: they are comprised of a physical property (e.g. molar amount), and either the physical entity (e.g. glucose) or physical process (e.g. glucokinase activity) that possesses the property.  See the section on composite annotations below and Gennari et al. 2010, for more details. Singular annotations are comprised of one concept from a reference ontology, and should only be used when the composite annotation approach is insufficient to define the model codeword ("Heart Rate", from the NCI Thesaurus, for example). Human-readable definitions are straight text, natural language descriptions of model terms and do not include references to ontological concepts.

The image below shows the main Annotator interface for a hemodynamics model in SemGen v1.0. The codeword "VLV" has been annotated with composite and human-readable annotations.

Composite annotations

Each composite annotation consists of a physical property term connected to a physical entity or physical process term. The physical entity term can itself also be a composite of ontology terms. We recommend using only terms from the Ontology of Physics for Biology (OPB) for the physical property annotation components. For the physical entity annotations we recommend using robust, thorough, and widely accepted online reference ontologies like the Foundational Model of Anatomy (FMA), Chemical Entities of Biological Interest (ChEBI), and Gene Ontology cellular components (GO-cc). For physical processes annotations, we recommend creating custom terms and defining them by identifying their thermodynamic sources, sinks and mediators from the physical entities in the model.

When you edit a composite annotation for a model codeword, the Annotator provides an interface for rapid searching and retrieval of reference ontology concepts via the BioPortal webservice. 

Example: Suppose you are annotating a beta cell glycolysis model that includes a codeword representing glucose concentration in the cytosol of the cell.

A detailed composite annotation would be:

OPB:Chemical concentration <propertyOf> CHEBI:glucose <part_of> FMA:Portion of cytosol <part_of> FMA:Beta cell

In this case we use the term Chemical concentration from the OPB for the physical property part of the annotation, and we compose the physical entity part by linking four concepts - one from the OPB, one from ChEBI and two from the FMA. This example illustrates the post-coordinated nature of the SemSim approach to annotation and how it provides high expressivity for annotating model terms.

The above example represents a very detailed composite annotation, however, such detail may not be necessary to disambiguate concepts in a given model. For example, there may not be any other portions of glucose within the model apart from that in the cytosol. In this case, one could use the first three terms in the composite annotation and still disambiguate the model codeword from the rest of the model's contents:

OPB:Chemical concentration <propertyOf> CHEBI:glucose

Although this annotation approach does not fully capture the biophysical meaning of the model codeword, SemGen is more likely to find semantic overlap between models if they use this shallower annotation style. This is mainly because the SemGen Merger tool currently only recognizes semantic equivalencies; it does not identify semantically similar terms in models that a user wants to integrate. Therefore, if a user wants to integrate our example glycolysis model with a TCA cycle model based on cardiac myocyte metabolism, the shallower approach would likely identify more semantic equivalencies than the more detailed approach.  

Nonetheless, we recommend using the more detailed approach, given that future versions of SemGen will include a "Merging Wizard" that will identify and rank  codewords that are semantically similar, not just semantically identical. 


The Extractor tool provides five methods for decomposing SemSim models into sub-models. This decomposition process is useful if you want to "carve out" a smaller portion of a given model in order to remove extraneous model features. 

The five extraction methods are represented by the boxes along the left side of the interface: extraction by physical entity, extraction by codeword, and extraction by cluster. They can be used independently or in concert.

Extraction by physical process (SemGen 1.3 and higher): This semantics-based extraction method allows a user to specify which physical processes they want to preserve in the extracted model. In the "Processes" box, select those physical processes in the model you want to preserve and SemGen will retain the codewords annotated with those physical entities in the extracted code. The "include participants" option adds all codewords annotated against the physical entities that participate in the process. In the extracted model, all these preserved codewords are retained along with the input codewords needed for their computation.

Extraction by physical entity: This semantics-based extraction method allows a user to specify which physical entites they want to preserve in the extracted model. In the "Physical entities to extract" box, select those physical entities in the model that you want to preserve, and SemGen will preserve the codewords annotated against those physical entities in the extracted code.

Extraction by sub-model (SemGen 1.3 and higher): SemSim sub-models represent groups of codewords. This extraction method allows a user to specify which sub-models they want to extract. In the "Sub-models" box, select those sub-models in the model that you want to retain, and SemGen will preserve the codewords associated with that sub-model in the extracted code.

Extraction by codeword:  This extraction method is included so that users have a maximal level of customization over their extracted model. SemGen preserves any codeword selected in this list in the extracted sub-model, along with the direct inputs needed for its computation. Select the "Include full dependency chain" option to preserve the complete computational dependency map needed to compute a selected codeword. Instead of turning the direct inputs to a selected codeword's computation into user-defined parameters, this option retains the full computational network needed to compute the codeword.

Extraction by computational network cluster: This third extraction method uses a network clustering algorithm to identify the modular portions of a SemSim model's computational network. By identifying the more closely-related computational clusters within a model, this extraction method provides a tool for users who might be unsure about how to best to decompose a model into modular components. Clicking the "Cluster" button in the "Clusters to extract" box will open the cluster identification tool. Raising the slider bar on the cluster identifier iteratively removes edges in the computational network graph of the SemSim model and in turn, delineates higher and higher numbers of computational modules within the model. Sometimes these computational modules will reflect the semantic architecture of the models, and so SemGen also includes a list of physical entities associated with each identified module in a separate window off to the left. 

The screenshot below shows a physical entity-based extraction of the left ventricle dynamics from its parent cardiovascular model. Green coloring in the computational network map indicates codewords that are dependent variables in the source model, but will be converted into user-defined inputs in the extracted code.


The screenshot below shows the cluster identification interface. In this example, SemGen decomposed the cardiovascular dynamics model mentioned above into three clusters that delineate the heart, pulmonary circulation and systemic circulation components.


The Merger tool helps automate the integration of two SemSim models. The Merger identifies the interface between two models by comparing the biological meaning of the models' codewords as expressed by their composite and singular annotations. If the two models share the same biological concept, the codewords representing this concept are mapped to each other and the user must decide which computational representation of the concept they want to preserve in the integrated model. 

The screenshot below shows an integration between a cardiovascular dynamics and a baroreceptor model. The semantic mappings in the center represent the potential interface points between the models. These are the resolution points that SemGen automatically identifies - for each biological concept listed, the user chooses which computational representation of the concept they wish to preserve in the merged model. The user can also introduce manual mappings by selecting individual codewords from the bottom panels and pressing the "Add manual mapping" button. The mapping is then added as a resolution point in the center panel.


The Coder tool translates an existing SemSim model into executable simulation code. Eventually, users will be able to translate SemSim models into a variety of simulation languages, but currently the Coder only supports translation into the CellML and MML (JSim) formats.

This tool is completely automated: select the SemSim model you want to encode and where you want to save the encoded model, and the Coder will translate the SemSim model into the format you select.