Projects‎ > ‎SemGen‎ > ‎

SemGen help

This is the help document for SemGen, software for semantics-based composition and decomposition of biosimulation models.
We welcome feedback from the user community in order to improve SemGen's utility.
Please send feedback and bug reports to:

Maxwell Neal: maxneal[at]gmail[dot]com

SemGen origins

A number of years ago I got interested in creating integrated models of human physiology for use as medical decision support tools. At the time, I was new to the field of simulation and modeling. I quickly learned how difficult and time-consuming it was to integrate two models created by authors in different research labs. I was discouraged by the difficulties in obtaining code for published models, and when the code was not available, I was discouraged by the tedious and error-prone tasks associated with re-coding the model myself from its source publication. I wanted to be able to search for and download two models from a physiological model repository and have a computer automate their integration. Later, after I had amassed a collection of large models, I found that many of my modeling tasks required extracting out a subcomponent from a larger model. Therefore, many of my modeling tasks during this time, whether integrations or decompositions, would have been accelerated by a modular modeling framework. This is why I and my colleagues at the University of Washington created the SemSim model description format, which provides a modular framework to model composition and decomposition. Soon thereafter I created SemGen as part of my dissertation work because we needed a software tool to help automate the creation, annotation, composition and decomposition of SemSim models.

In a nutshell, SemSim is a semantics-based approach to modular modeling. If you're interested in the gory details of the SemSim approach, you can find them in my dissertation, and the publications of the Semantics of Biological Processes group. In my experience, modelers do not create models with the intention of integrating them with models developed by outside researchers. Furthermore, to do so requires an understanding of how the model will be repurposed once disseminated among the modeling community. As it is not feasible to create a model that anticipates these diverse purposes, it becomes difficult to specify how a model should interface with others in a broad sense. Therefore, to automate model integration, a computer needs to recognize how two models should interface with each other. For biological modeling, this task requires identifying where the models describe the same biology. In other words, the computer needs to know where the models are semantically equivalent, because those are the biologically-meaningful points of coupling between the models - i.e. the interface. The SemSim format provides the rich semantics needed to capture a model's biological contents in a machine-readable way, and SemGen provides the tools for adding these semantic annotations to SemSim models. Semantically enriched models offer more opportunities to automate model composition and decomposition, and SemGen was designed to take advantage of these opportunities.

The SemSim vision of modular modeling

Ultimately, the SemSim vision is to allow a modeler to download a model from any of the major declarative repositories (BioModels, the CellML model repository, and the model repository) and convert them into interoperable SemSim models. Using SemGen, modelers will be able to automate, as much as possible, the modular composition and decomposition of SemSim models without the need for manual coding. Integrated and extracted SemSim models will in turn be added to a model repository for dissemination and reuse. Eventually there will be a number of encoding tools within SemGen that will translate SemSim models into a variety of simulation languages so that modelers can generate executable simulation code in their language of choice.

Differences between SemSim and other declarative modeling languages

Unlike the Systems Biology Markup Language (SBML), the SemSim approach to modularity and interoperability scales across biological levels of organization and research domains. Whereas SBML models carry an underlying assumption that they represent a set of chemical reactions, the SemSim framework is intended to be multi-scale and multi-domain. To realize this vision, SemSim leverages the wealth of semantic information contained in standardized reference ontologies, and together, these ontologies provide annotation concepts across modeling scales and domains. Thus, SemSim provides an annotation framework that is more expressive and explicit than other declarative modeling formats. Although the SBML standard allows for some semantic annotation against reference ontologies, it cannot currently capture the full biological meaning of model codewords in a machine-readable way. The same is true with the CellML modeling language.

SemSim also differs from the CellML approach to model sharing and modularity in that SemSim models do not dilneate their internal components into a single set of specific sub-components. Instead, by leveraging the rich semantic annotations in SemSim models, modelers can use SemGen's Extractor tool to decompose a SemSim model in a number of different ways. This enables modelers to "carve out" the exact parts of a model they want to extract, without being constrained by a pre-coordinated decomposition.

A work in progress

Both SemSim and SemGen are works in progress. I and my colleagues hope that the broader biosimulation community will find these technologies useful, and we look forward to improving SemGen to meet users' needs. We welcome constructive feedback from anyone interested in applying this semantics-based modular modeling approach. Thanks for your interest!

-Maxwell Neal

SemGen 3.0 help


SemGen is a java-based program and requires java run time environment version 1.7 (64-bit) or higher to execute.
To check your java version, go to a command prompt and enter:
java -version

Annotator requirements

For SemGen version 1.3 and below, JSim must be installed on your system to convert existing MML, SBML and CellML models into the SemSim format. We recommend using version 2.05 with SemGen version 1.3. We recommend JSim version 1.6.93 for prior SemGen releases. An independent JSim installation is not required for SemGen versions above 1.3.

JSim version 2.05

JSim version 1.6.93

When you first attempt to annotate an MML model, you may be prompted to find the 'jsbatch' command line program that comes with the JSim software. This tool is necessary for converting existing models into the SemSim format.

Typical locations of the jsbatch program:
    C:\...[JSim home directory]\win32\bin\jsbatch.bat
    /...[JSim home directory]/macos/bin/jsbatch
    /...[JSim home directory]/linux_i386/bin/jsbatch
Once SemGen prompts you to locate jsbatch, find it within the JSim home directory. SemGen will remember its location and you will not need to locate the file in the future.

Getting Started

    Double-click the "SemGen.exe" file in the main SemGen directory.

    Double-click the "" file in the main SemGen directory.

    Double-click the "SemGen.jar" file in the main SemGen directory.


With the Annotator tool, you can convert mathematical models into the SemSim format and annotate the model's codewords using concepts from online reference ontologies. Currently the Annotator can convert MML, SBML, and CellML models into the SemSim format, as long as the JSim simulation engine can compile them. The Semantics of Biological Processes group maintains a protocol for annotating a model which can help guide the annotation process.

Once you have a model loaded into the Annotator, you can specfiy composite, singular and human-readable annotations for the model codewords. Composite annotations follow a specific annotation grammar: they are comprised of a physical property (e.g. molar amount), and either the physical entity (e.g. glucose) or physical process (e.g. glucokinase activity) that possesses the property.  See the section on composite annotations below and Gennari et al. 2010, for more details. Singular annotations are comprised of one concept from a reference ontology, and should only be used when the composite annotation approach is insufficient to define the model codeword ("Heart Rate", from the NCI Thesaurus, for example). Human-readable definitions are straight text, natural language descriptions of model terms and do not include references to ontological concepts.

The image below shows the main Annotator interface for a hemodynamics model in SemGen v1.0. The codeword "VLV" has been annotated with composite and human-readable annotations.

Composite annotations

Each composite annotation consists of a physical property term connected to a physical entity or physical process term. The physical entity term can itself also be a composite of ontology terms. We recommend using only terms from the Ontology of Physics for Biology (OPB) for the physical property annotation components. For the physical entity annotations we recommend using robust, thorough, and widely accepted online reference ontologies like the Foundational Model of Anatomy (FMA), Chemical Entities of Biological Interest (ChEBI), and Gene Ontology cellular components (GO-cc). For physical processes annotations, we recommend creating custom terms and defining them by identifying their thermodynamic sources, sinks and mediators from the physical entities in the model.

When you edit a composite annotation for a model codeword, the Annotator provides an interface for rapid searching and retrieval of reference ontology concepts via the BioPortal webservice. 

Example: Suppose you are annotating a beta cell glycolysis model that includes a codeword representing glucose concentration in the cytosol of the cell.

A detailed composite annotation would be:

OPB:Chemical concentration <propertyOf> CHEBI:glucose <part_of> FMA:Portion of cytosol <part_of> FMA:Beta cell

In this case we use the term Chemical concentration from the OPB for the physical property part of the annotation, and we compose the physical entity part by linking four concepts - one from the OPB, one from ChEBI and two from the FMA. This example illustrates the post-coordinated nature of the SemSim approach to annotation and how it provides high expressivity for annotating model terms.

The above example represents a very detailed composite annotation, however, such detail may not be necessary to disambiguate concepts in a given model. For example, there may not be any other portions of glucose within the model apart from that in the cytosol. In this case, one could use the first three terms in the composite annotation and still disambiguate the model codeword from the rest of the model's contents:

OPB:Chemical concentration <propertyOf> CHEBI:glucose

Although this annotation approach does not fully capture the biophysical meaning of the model codeword, SemGen is more likely to find semantic overlap between models if they use this shallower annotation style. This is mainly because the SemGen Merger tool currently only recognizes semantic equivalencies; it does not identify semantically similar terms in models that a user wants to integrate. Therefore, if a user wants to integrate our example glycolysis model with a TCA cycle model based on cardiac myocyte metabolism, the shallower approach would likely identify more semantic equivalencies than the more detailed approach. 

Nonetheless, we recommend using the more detailed approach, given that future versions of SemGen will include a "Merging Wizard" that will identify and rank  codewords that are semantically similar, not just semantically identical.


The Extractor tool provides five methods for decomposing SemSim models into sub-models. This decomposition process is useful if you want to "carve out" a smaller portion of a given model in order to remove extraneous model features.

The five extraction methods are represented by the boxes along the left side of the interface: extraction by physical entity, extraction by codeword, and extraction by cluster. They can be used independently or in concert.

Extraction by physical process (SemGen 1.3 and higher): This semantics-based extraction method allows a user to specify which physical processes they want to preserve in the extracted model. In the "Processes" box, select those physical processes in the model you want to preserve and SemGen will retain the codewords annotated with those physical entities in the extracted code. The "include participants" option adds all codewords annotated against the physical entities that participate in the process. In the extracted model, all these preserved codewords are retained along with the input codewords needed for their computation.

Extraction by physical entity: This semantics-based extraction method allows a user to specify which physical entites they want to preserve in the extracted model. In the "Physical entities to extract" box, select those physical entities in the model that you want to preserve, and SemGen will preserve the codewords annotated against those physical entities in the extracted code.

Extraction by sub-model (SemGen 1.3 and higher): SemSim sub-models represent groups of codewords. This extraction method allows a user to specify which sub-models they want to extract. In the "Sub-models" box, select those sub-models in the model that you want to retain, and SemGen will preserve the codewords associated with that sub-model in the extracted code.

Extraction by codeword:  This extraction method is included so that users have a maximal level of customization over their extracted model. SemGen preserves any codeword selected in this list in the extracted sub-model, along with the direct inputs needed for its computation. Select the "Include full dependency chain" option to preserve the complete computational dependency map needed to compute a selected codeword. Instead of turning the direct inputs to a selected codeword's computation into user-defined parameters, this option retains the full computational network needed to compute the codeword.

Extraction by computational network cluster: This third extraction method uses a network clustering algorithm to identify the modular portions of a SemSim model's computational network. By identifying the more closely-related computational clusters within a model, this extraction method provides a tool for users who might be unsure about how to best to decompose a model into modular components. Clicking the "Cluster" button in the "Clusters to extract" box will open the cluster identification tool. Raising the slider bar on the cluster identifier iteratively removes edges in the computational network graph of the SemSim model and in turn, delineates higher and higher numbers of computational modules within the model. Sometimes these computational modules will reflect the semantic architecture of the models, and so SemGen also includes a list of physical entities associated with each identified module in a separate window off to the left.

The screenshot below shows a physical entity-based extraction of the left ventricle dynamics from its parent cardiovascular model. Green coloring in the computational network map indicates codewords that are dependent variables in the source model, but will be converted into user-defined inputs in the extracted code.


The screenshot below shows the cluster identification interface. In this example, SemGen decomposed the cardiovascular dynamics model mentioned above into three clusters that delineate the heart, pulmonary circulation and systemic circulation components.


The Merger tool helps automate the integration of two SemSim models. The Merger identifies the interface between two models by comparing the biological meaning of the models' codewords as expressed by their composite and singular annotations. If the two models share the same biological concept, the codewords representing this concept are mapped to each other and the user must decide which computational representation of the concept they want to preserve in the integrated model.

The screenshot below shows an integration between a cardiovascular dynamics and a baroreceptor model. The semantic mappings in the center represent the potential interface points between the models. These are the resolution points that SemGen automatically identifies - for each biological concept listed, the user chooses which computational representation of the concept they wish to preserve in the merged model. The user can also introduce manual mappings by selecting individual codewords from the bottom panels and pressing the "Add manual mapping" button. The mapping is then added as a resolution point in the center panel.


The Coder tool translates an existing SemSim model into executable simulation code. Eventually, users will be able to translate SemSim models into a variety of simulation languages, but currently the Coder only supports translation into the CellML and MML (JSim) formats.

This tool is completely automated: select the SemSim model you want to encode and where you want to save the encoded model, and the Coder will translate the SemSim model into the format you select.

License Info

Copyright (c) 2010-2016 Maxwell Neal, University of Washington.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
for COMMERCIAL PURPOSES IS PROHIBITED without prior written permission from the author.
Redistribution and use in source and binary forms, with or without modification, are
permitted for non-commercial purposes (such as for research, personal use, or educational use),
 provided that redistribution in any form includes this entire notice in all copies of the
 software, derivative works, and supporting documentation.
The name of the author or the University of Washington may not be used to endorse or
 promote software or services derived from this software without prior written permission
  from the author and/or the University.