Baker NA, Barr JL, Bonheyo GT, Joslyn CA, Krishnaswami K, Oxley ME, Quadrel R, Sego LH, Tardiff MF, Wynne AS. Research towards a systematic signature discovery process. IEEE Intelligence and Security Informatics Signature Discovery Workshop, 2013.
In its most general form, a signature is a unique or distinguishing measurement, pattern, or collection of data that identifies a phenomenon (object, action, or behavior) of interest. The discovery of signatures is an important aspect of a wide range of disciplines from basic science to national security for the rapid and efficient detection and/or prediction of phenomena. Current practice in signature discovery is typically accomplished by asking domain experts to characterize and/or model individual phenomena to identify what might compose a useful signature. What is lacking is an approach that can be applied across a broad spectrum of domains and information sources to efficiently and robustly construct candidate signatures, validate their reliability, measure their quality, and overcome the challenge of detection — all in the face of dynamic conditions, measurement obfuscation, and noisy data environments. Our research has focused on the identification of common elements of signature discovery across application domains and the synthesis of those elements into a systematic process for more robust and efficient signature development. In this way, a systematic signature discovery process lays the groundwork for leveraging knowledge obtained from signatures to a particular domain or problem area, and, more generally, to problems outside that domain. This paper presents the initial results of this research by discussing a mathematical framework for representing signatures and placing that framework in the context of a systematic signature discovery process. Additionally, the basic steps of this process are described with details about the methods available to support the different stages of signature discovery, development, and deployment.
- Reprint: PDF
A short correspondence discussing the importance of, and potential mechanisms for, data sharing in nanotechnology.
Thomas DG, Gaheen S, Harper SL, Fritts M, Klaessig F, Hahn-Dantona E, Paik DS, Pan S, Stafford GA, Freund ET, Klemm JD, Baker NA. ISA-TAB-Nano: A Specification for Sharing Nanomaterial Research Data in Spreadsheet-based Format. BMC Biotechnology, 13, 2, 2013.
[Background and motivation] The high-throughput genomics communities have been successfully using standardized spreadsheet-based formats to capture and share data within labs and among public repositories. The nanomedicine community has yet to adopt similar standards to share the diverse and multi-dimensional types of data (including metadata) pertaining to the description and characterization of nanomaterials. Owing to the lack of standardization in representing and sharing nanomaterial data, most of the data currently shared via publications and data resources are incomplete, poorly-integrated, and not suitable for meaningful interpretation and re-use of the data. Specifically, in its current state, data cannot be effectively utilized for the development of predictive models that will inform the rational design of nanomaterials. [Results] We have developed a specification called ISA-TAB-Nano, which comprises four spreadsheet-based file formats for representing and integrating various types of nanomaterial data. Three file formats (Investigation, Study, and Assay files) have been adapted from the established ISA-TAB specification; while the Material file format was developed de novo to more readily describe the complexity of nanomaterials and associated small molecules. In this paper, we have discussed the main features of each file format and how to use them for sharing nanomaterial descriptions and assay metadata. [Conclusion] The ISA-TAB-Nano file formats provide a general and flexible framework to record and integrate nanomaterial descriptions, assay data (metadata and endpoint measurements) and protocol information. Like ISA-TAB, ISA-TAB-Nano supports the use of ontology terms to promote standardized descriptions and to facilitate search and integration of the data. The ISA-TAB-Nano specification has been submitted as an ASTM work item to obtain community feedback and to provide a nanotechnology data-sharing standard for public development and adoption.
Thomas DG, Chun J, Chen Z, Wei G, Baker NA. Parameterization of a Geometric Flow Implicit Solvation Model. Journal of Computational Chemistry, 34, 687-95, 2013.
Implicit solvent models are popular for their high computational efficiency and simplicity over explicit solvent models and are extensively used for computing molecular solvation properties. The accuracy of implicit solvent models depends on the geometric description of the solute-solvent interface and the solvent dielectric profile that is defined near the surface of the solute molecule. Typically, it is assumed that the dielectric profile is spatially homogeneous in the bulk solvent medium and varies sharply across the solute-solvent interface. However, the specific form of this profile is often described by ad hoc geometric models rather than physical solute-solvent interactions. Hence, it is of significant interest to improve the accuracy of these implicit solvent models by more realistically defining the solute-solvent boundary within a continuum setting. Recently, a differential geometry-based geometric flow solvation model was developed, in which the polar and nonpolar free energies are coupled through a characteristic function that describes a smooth dielectric interface profile across the solvent–solute boundary in a thermodynamically self-consistent fashion. The main parameters of the model are the solute/solvent dielectric coefficients, solvent pressure on the solute, microscopic surface tension, solvent density, and molecular force-field parameters. In this work, we investigate how changes in the pressure, surface tension, solute dielectric coefficient, and choice of different force-field charge and radii parameters affect the prediction accuracy for hydration free energies of 17 small organic molecules based on the geometric flow solvation model. The results of our study provide insights on the parameterization, accuracy, and predictive power of this new implicit solvent model.
Annotating the structure and components of a nanoparticle formulation using computable string expressions
Thomas DG, Chikkagoudar S, Chappell AR, Baker NA. Annotating the structure and components of a nanoparticle formulation using computable string expressions IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2012), in press.
Nanoparticle formulations that are being developed and tested for various medical applications are typically multi-component systems that vary in their structure, chemical composition, and function. It is difficult to compare and understand the differences between the structural and chemical descriptions of hundreds and thousands of nanoparticle formulations found in text documents. We have developed a string nomenclature to create computable string expressions that identify and enumerate the different high-level types of material parts of a nanoparticle formulation and represent the spatial order of their connectivity to each other. The string expressions are intended to be used as IDs, along with terms that describe a nanoparticle formulation and its material parts, in data sharing documents and nanomaterial research databases. The strings can be parsed and represented as a directed acyclic graph. The nodes of the graph can be used to display the string ID, name and other text descriptions of the nanoparticle formulation or its material part, while the edges represent the connectivity between the material parts with respect to the whole nanoparticle formulation. The different patterns in the string expressions can be searched for and used to compare the structure and chemical components of different nanoparticle formulations. The proposed string nomenclature is extensible and can be applied along with ontology terms to annotate the complete description of nanoparticle formulations.
Jacob F, Wynne A, Liu Y, Baker N, Gray J. Domain-specific languages for composing signature discovery workflows. The 12th Workshop on Domain-Specific Modeling, 2012.
Domain-agnostic signature discovery entails investigation across multiple scientific disciplines. The breadth and cross-disciplinary nature of this work requires that existing executable applications be integrated with new capabilities into workflows, representing a wide range of user tasks. An algorithm may be written in multiple programming languages for various hardware platforms, and so workflow composition requires integrating executables from any number of remote hosts. This raises an engineering issue on how to generate web service wrappers for these heterogeneous executables and to compose them into a scientific workflow environment (e.g., Taverna). In this position paper, we summarize our work on two simple Domain-Specific Languages (DSLs) that automate these processes. Our Service Description Language (SDL) describes key elements of a signature discovery service and automatically generates its implementation code. The Workflow Description Language (WDL) describes the pipeline of services and generates deployable artifacts for the Taverna workflow management system. We demonstrate our approach with a real-world workflow composed of services wrapping remote executables.
- Preprint: PDF
Chen Z, Zhao S, Chun J, Thomas DG, Baker NA, Wei GW. Variational approach for nonpolar solvation analysis. Journal of Chemical Physics, 137, 084101, 2012.
Solvation analysis is one of the most important tasks in chemical and biological modeling. Implicit solvent models are some of the most popular approaches. However, commonly used implicit solvent models rely on unphysical definitions of solvent-solute boundaries. Based on differential geometry, the present work defines the solvent-solute boundary via the variation of the nonpolar solvation free energy. The solvation free energy functional of the system is constructed based on a continuum description of the solvent and the discrete description of the solute, which are dynamically coupled by the solvent-solute boundaries via van der Waals interactions. The first variation of the energy functional gives rise to the governing Laplace-Beltrami equation. The present model predictions of the nonpolar solvation energies are in an excellent agreement with experimental data, which supports the validity of the proposed nonpolar solvation model.