Tools
FAIRDOM-SEEK
FAIRDOM-SEEK is a web-based cataloguing and commons platform, for sharing heterogeneous scientific research datasets, models or simulations, processes and research outcomes. It preserves associations between them, along with information about the people and organisations.
Underpinning FAIRDOM-SEEK is the ISA infrastructure, a standard framework for describing how individual experiments are aggregated into wider studies and investigations. Within FAIRDOM-SEEK, ISA has been extended and is configurable to allow the structure to be used outside of Biology.
Flexible and detailed sharing permissions are available to manage the catalogued items from early collaborations within projects, through to the publishing of final research results. At this point a DOI can be generated for individual items, or entire aggregates packaged as Research Objects.
FAIRDOM-SEEK incorporates semantic technology, allowing sophisticated queries over the content. Metadata can be collected using standard Excel tools and processes, through the use of RightField.
FAIRDOM-SEEK can be downloaded, installed, and managed locally as a solution to data sharing within groups and consortia. In addition, a publically available instance of a FAIRDOM-SEEK commons is available as the FAIRDOMHub.
EXCEMPLIFY
In systems biology, quantitative experimental data are fundamental for building mathematical models. In many cases, these data are stored as Excel files and hosted locally. To improve the exchange and long-time storage of data, Excemplify was developed. Excemplify, a web-based application facilitates storage of the experimental data as well as the corresponding metadata in a central database, thus turning the data searchable, comparable and exchangeable. Additionally, by utilizing the embedded knowledge of templates, Excemplify is able to parse experimental data from the initial experimental setup stage and to generate the following spreadsheet stages in the experimental workflow automatically. The required manual copy and paste operations are performed within Excemplify, thus eliminating possible mistakes. Apart from the data storage capabilities of Excemplify, experimentalists are disburdened of the time-consuming data-handling procedures and error-prone manual impositions. Excemplify was developed in close collaboration with the group for Systems Biology of Signal Transduction which finally resulted in a release of a production version of Excemplify at the German Cancer Research Institute (DKFZ).
ChemHits
Normalization and Matching of Chemical Compound Names
Despite all standardization efforts in the field of chemical nomenclatures, a chemical compound still can be found having many different names – trivial, as well as systematic names. Hence, the unambiguous identification of a chemical compound solely based on its name requires comprehensive chemical knowledge and often extensive searches in chemical databases. As many publications exclusively describe a chemical compound by its name the matching of these diverging notations can be tedious. However, this identification is crucial for the integration of biochemical data, e.g. for the bundling of data in databases or for the setup of biochemical models based on published data found in the literature.
We have developed ChemHits, an application which detects and matches synonymic names of (bio-)chemical compounds and thereby facilitates merging of corresponding data referring to the same compound, but described with different names. The tool that we have developed is based on natural language processing (NLP) methods and applies transformation rules to systematically process chemical compound names to a unique generic normalized name form. It is capable of normalizing a given name of a chemical compound and matching it against names in (bio-)chemical databases, like KEGG COMPOUND, SABIO-RK or ChEBI, even when there is no exact name-to-name-match. The tool is also able to match a complete list of compound names against these databases which makes it useful for the automatic cross-annotation of chemical data in databases.