NAMIC Wiki - User contributions [en]

SDIWG:Meeting Minutes 20090320

2009-03-23T04:30:04Z

Aris:

==NCBC Joint Working Group Meeting==

[[SDIWG:Software_and_Data_Integration_Working_Group|Top page of SDIWG web site]]

===Notes===
''Friday March 20, 2009: 2:30-3:30 PM EST'' 
''Please contact Peter Lyster for information [mailto:lysterp@mail.nih.gov Peter Lyster]'' 
''[[SDIWG:Software_Engineering_Body_of_Knowledge_Across_NCBC_Biocomputing_Centers | SDIWG Meetings Page]]'' 
Save the date for the next SDWIG tcon/Connect, three months from now: Friday June 20, 2009

===Tcon Agenda===

*Roll Call/Note-taker (Aris Floratos) (5 min)
*Minutes from most recent SDIWG tcon [[SDIWG:Meeting_Minutes_20080919|Sepember 19, 2008]] (5 min)
*Status check on '''three working groups''':
**[[SDIWG:_NCBC_Resource_Yellow_Pages_and_Software_Ontologies|Biositemaps Resourceome]]
***State of [http://www.biositemap.org Biositemaps]. The underlying technology for Biositemaps is the [http://bioportal.bioontology.org/visualize/39521 Biomedical Resource Ontology (BRO) V2.6] and the [http://biositemaps.bioontology.org/editor/ Information Model (IM) which is encapsulated in the Biositemaps Editor. Please try it out.]
*** The [[Tiger_team_biositemaps | Tiger Team]] is working on this phase of biositemaps.
*** If you want to view the working group thread, suggest you start at the [[SDIWG:Meeting_Minutes_20080716 | Four basic activities, Authoring/Consuming of Biositemaps/BRO]]
*** [http://www.na-mic.org/Wiki/index.php/SDIWG:Action_Items_20090212 Current list of Action Items and Feature Requests for Tiger Team]
*** [[Plans for adoption within and beyond NCBCs | Plans for adoption within and beyond NCBCs]]
***[[Media:Biositemaps_white_paper_v4.0.doc | Biositemaps white paper v4.0]] (Needs Updating: Peter, Ivo, Daniel).
***[[Cheat Sheet for NCBC Leads to adopt Biositemaps | Cheat Sheet for NCBC Leads to adopt Biositemaps]]
** [[SDIWG:_NCBC_Scientific_Ontologies|Scientific Ontologies]]
** [http://www.ncbcs.org/dbp_interactions.html NCBC Driving Biological Projects Interactions and Impact]
**NCBC Dissemination and shared calendar: Dissemination page is at http://wiki.simtk.org/ncbcdisseminate/. The prototype calendar is at http://www.ncbcs.org/cal.html with RSS feeds available for each center at http://www.ncbcs.org/summary-cal.html. Please check it out and provide feedback. (Joy)
**Meeting on Shared Names URI April 29-30 Boston (Mark)
**Brief update on NCBC supplements: Biositemaps (Ivo, Beth, Mark); Image data (Will); …
**Simbios Glossy on working groups (Joy, Brian)
* Questions/Information Sharing

 

===Minutes===
* '''Note taker''': [Suggested order of note takers for future meetings: '''Aris Floratos (this one)'''; Schroeder, Joy Ku; Rubin; Jags, Dinov, Murphy]
* Review of [[SDIWG:Meeting_Minutes_20080919|September 19, 2008 Minutes]]
* Attendees:

'''''Karin Remington - Update on Common Fund'''''
---------------------------------------
* Majority of common fund reserved for new submissions. Administrative/competitive revision supplements will be handled on an ad-hoc basis, a smaller portion of the common fund will go there. This in addition to the stimulus $$$ to be handed out by individual ICs.
* In response to a question by Andrea Califano if one could submit a supplement for 50% of a Center's budget, Karin said that she will inquire about specifics but that 50% seemed rather on the high side of what would/could be supported. The point was also made that there is a single pool of money for both supplement types (admin/competitive) and that competitive supplements are appropriate only in case of expanding work scope.
*In terms of timelines, there is a good chance that there will be additional calls or supplements, in addition to the one in the recent RFA. Centers that wish to pursue supplements should discuss their proposals with their program officer, who needs to sign off to supplement requests. In the context of such conversations it is appropriate to inquire if the program officer's IC would be willing to pick up the supplemet funding (instead of having the $$$ come from the Common Fund).
* Regarding the NCBC re-competition, the application submission deadline will probably be sometime in September-October.

'''''Andrea Califano - Update on DBP WG'''''
----------------------------------
* Update of WG calls. Agreed to create an interactive DBP-ome. Provide info on projects, tools, publications, interactions, etc. Points of contact have been defined at the various Centers to collect data and develop sw. A web site (similar to that developed by the yellow-pages WG) will be created soon.
* Another initiative agreed upon is to have a web site to publish movies of seminars that take place in the various institutes. Lyster/Athey commented that the site could also be used to post other related talks, e.g., those from Valerie Florance's seminars series.
* (Cavacoli). He sent a message earlier today about the info needed for the DBP-ome. People are responding rapidly. He expects him and Zhong Li to start working on the collected data. Also, mentioned that NCIBI has a software for sharing seminar movies, we can maybe use that.

'''''Beth Kirschner - Biosite maps'''''
-------------------
* The biosite map site is complete. All needed search/browsing functionality is complete. Next step is populating with content about various Center tools.
* (Athey): can this be used to catalogue the software tools in the 38 CTSAs (Besich and him are spearheading)? Beth said yes, and that they are working on this with the CTSA program.
* (Lyster): a number of other NIH programs are interested in adopting the biosite maps. He said he will be contacting Center contacts to initiate the process of adding content.

'''''Scientific Ontologies'''''
---------------------
* (Athey): The bioontologies draft has been sent to Mark Musen who plans to review it shortly.

'''''Joy Ku - Dissemination'''''
----------------------
* Talked about the calendar on the ncbcs.org site, listing the various Center events. Described the Web GUI and how it can be used to acquire information about the various events. The idea is that each Center will appoint a contact person that will be responsible for updating their portion of the calendar.

'''''Misc'''''
----
* (Musen): talked about an effort spearheaded in coordination with W3 for defining standard terms that can be used to annotate medical info on the web. One day workshop on April 29 (pior to the BioIT meeting) to look at this issue.

* (Lyster): also mentioned a couple of other meetings related to bioontologies. There was a discussion if the ncbcs.org calendar could use a "general purpose" category to capture such interesting but non-Center specific events.

* (Ku): there is a Symbios page that lists notes from a recent meeting with NCBO, where they discussed efforts for coordinating dissemination efforrts. There was general agreement for an effort similar to the DBP group, where Centers work together to promote the coordinated dissemination of the NCBC products.

===Action Items===
* TBD

SDIWG:Software Engineering At MAGNet

2007-05-19T22:38:07Z

Aris: /* Contact */

__TOC__
==Process==
The software development process in MAGNet typically takes place in 2 phases:
* Prototype tools are developed and field-tested at various investigator labs in the context of Core 1, 2 and 3 activities. These tools are usually command line versions, coded in an assortment of languages (C, C++, Perl, etc) and/or computational environments (MATLAB, R). In the majority of cases, analysis results are represented as text files that are either inspected with text editors or are transformed (through custom programs) to properly formatted inputs for downstream analysis/visualization tools.
* When prototypes have reached a satisfactory level of maturity the software engineering group takes over the task of integrating them into the Center's bioinformatics platform, geWorkbench, http://www.geworkbench.org. Operationally, each integration activity is carried out following a number of steps deriving from a standard software life cycle process:
:# '''Collection of business requirements''': an initial meeting is arranged with the investigator whose prototype is slated for integration. During the meeting high level requirements are collected, outlining the intended functionality of the integrated component. Such a meeting usually involves (i) a demo of geWorkbench, (ii) an overview presentation outlining the science behind the investigator's prototype, (iii) a demo of the prototype, and (iv) a brainstorming session aimed at identifying the most meaningful manner to integrate the prototype into geWorkbench so that the prototype leverages as much as possible of the functionality already present in geWorkbench.
:# '''Development of Use case document''': based on the information collected at the investigator meeting a detailed Use Case requirements document is developed which describes precisely what the user interface will look like and how users will interact with it. The Use Case document is subject to review and approval by the investigator. An example of such a document can be found [[Media:CNKB_UC.pdf | here]].
:# '''Development of user interface/computational service''': based on the requirements laid out in the use case document, the development team (i) develops the user interface for the integrated version of the investigator tool, and (ii) ports any associated compute server to the grid-enabled server framework used by geWorkbench (which is based on the caGrid software, https://cabig.nci.nih.gov/workspaces/Architecture/caGrid, a grid middleware layer developed in the context of the caBIG initiative). Unit tests are an integral part of the development process and are run automatically on a nightly basis.
:# '''System Testing''': After the prototype in integrated, an integration and system testing cycle is performed. Our group uses a formal testing process based on detailed test scripts whose execution status is tracked in a custom database we have developed for this purpose. In this database, failed scripts are linked to their associated defect reports within our defect management system, http://wiki.c2b2.columbia.edu/mantis/.
:# '''Documentation''': The final step in the development process is the generation of detailed end-user documentation, in the form of on-line help, tutorials, end-user guides and training slides. Links to these materials are available at: http://wiki.c2b2.columbia.edu/workbench/index.php/Project_Documentation.

geWorkbench is an open-source platform and we encourage and welcome code contributions by the community. To facilitate such contributions we have made the entire code base freely available to everyone through our community development project site, http://gforge.nci.nih.gov/projects/geworkbench/, registered with the NCI GForge project. This site enables us to leverage the infrastructure support that GForge offers to collaborative development projects, including access to a CVS server, streamlined setup of mailing lists and user-forums, posting software releases, etc.

==License==
At present, geWorkbench in made available under the licensing terms stated at http://wiki.c2b2.columbia.edu/workbench/index.php/GeWorkbench_License. We are currently in the process of modularizing our licensing , in order to accommodate the needs of different labs/tools.

==Contact==
The software engineering group at MAGNet is managed by Aris Floratos (floratos@c2b2.columbia.edu).

SDIWG:Software Engineering At MAGNet

2007-05-19T22:35:51Z

Aris: /* Contact */

SDIWG:Software Engineering At MAGNet

2007-05-19T22:35:37Z

Aris: /* Contact */

SDIWG:Software Engineering At MAGNet

2007-05-19T22:30:22Z

Aris: /* Process */

SDIWG:Software Engineering At MAGNet

2007-05-19T22:26:39Z

Aris: /* Process */

__TOC__
==Process==
The software development process in MAGNet typically takes place in 2 phases:
* Prototype tools are developed and field-tested at various investigator labs in the context of Core 1, 2 and 3 activities. These tools are usually command line versions, coded in an assortment of languages (C, C++, Perl, etc) and/or computational environments (MATLAB, R). In the majority of cases, analysis results are represented as text files that are either inspected with text editors or are transformed (through custom programs) to properly formatted inputs for downstream analysis/visualization tools.
* When prototypes have reached a satisfactory level of maturity the software engineering group takes over the task of integrating them into the Center's bioinformatics platform, geWorkbench, http://www.geworkbench.org. Operationally, each integration activity is carried out following a number of steps deriving from a standard software life cycle process:
:# '''Collection of business requirements''': an initial meeting is arranged with the investigator whose prototype is slated for integration. During the meeting high level requirements are collected, outlining the intended functionality of the integrated component. Such a meeting usually involves (i) a demo of geWorkbench, (ii) an overview presentation outlining the science behind the investigator's prototype, (iii) a demo of the prototype, and (iv) a brainstorming session aimed at identifying the most meaningful manner to integrate the prototype into geWorkbench so that the prototype leverages as much as possible of the functionality already present in geWorkbench.
:# '''Development of Use case document''': based on the information collected at the investigator meeting a detailed Use Case requirements document is developed which describes precisely what the user interface will look like and how users will interact with it. The Use Case document is subject to review and approval by the investigator. An example of such a document can be found [[Media:CNKB_UC.pdf | here]].
:# '''Development of user interface/computational service''': based on the requirements laid out in the use case document, the development team (a) develops the user interface for the integrated version of the investigator tool, and (2) ports any associated computate server to the grid-enabled server framework used by geWorkbench (which is based on the caGrid software, https://cabig.nci.nih.gov/workspaces/Architecture/caGrid, a grid middleware layer developed by the caBIG initiative). Unit tests are an integral part of the development process and are run automatically on a nightly basis.
:# '''System Testing''': After the prototype in integrated a formal integration and system testing cycle ensues. Our group uses a formal testing process based on detailed test scripts whose execution status is tracked in a custom database we have developed for this purpose. In this database, failed scripts are linked to their associated defect reports within our defect management system, http://wiki.c2b2.columbia.edu/mantis/.
:# '''Documentation''': The final step in the development process is the generation of detailed end-user documentation, in the form of on-line help, tutorials, end-user guides and training slides. Links to these materials are available at: http://wiki.c2b2.columbia.edu/workbench/index.php/Project_Documentation.

geWorkbench is an open-source platform and we encourage and welcome code contributions by the community. To facilitate such contributions we have made the entire code base freely available to everyone through our community development project site, http://gforge.nci.nih.gov/projects/geworkbench/, registered with the NCI GForge project. This site enables us to leverage the infrastructure support that GForge offers to collaborative development projects, including access to a CVS server, streamlined setup of mailing lists and user-forums, posting software releases, etc.

==License==
At present, geWorkbench in made available under the licensing terms stated at http://wiki.c2b2.columbia.edu/workbench/index.php/GeWorkbench_License. We are currently in the process of modularizing our licensing , in order to accommodate the needs of different labs/tools.

==Contact==
The software engineering group at MAGNet is managed by [mailto:%66%6C%6F%72%61%74%6F%73%40%63%32%62%32%2E%63%6F%6C%75%6D%62%69%61%2E%65%64%75 Aris Floratos].

File:CNKB UC.pdf

2007-05-19T22:24:30Z

Aris:

SDIWG:Software Engineering At MAGNet

2007-05-19T22:21:17Z

Aris: /* Process */

__TOC__
==Process==
The software development process in MAGNet typically takes place in 2 phases:
* Prototype tools are developed and field-tested at various investigator labs in the context of Core 1, 2 and 3 activities. These tools are usually command line versions, coded in an assortment of languages (C, C++, Perl, etc) and/or computational environments (MATLAB, R). In the majority of cases, analysis results are represented as text files that are either inspected with text editors or are transformed (through custom programs) to properly formatted inputs for downstream analysis/visualization tools.
* When prototypes have reached a satisfactory level of maturity the software engineering group takes over the task of integrating them into the Center's bioinformatics platform, geWorkbench, http://www.geworkbench.org. Operationally, each integration activity is carried out following a number of steps deriving from a standard software life cycle process:
:# '''Collection of business requirements''': an initial meeting is arranged with the investigator whose prototype is slated for integration. During the meeting high level requirements are collected, outlining the intended functionality of the integrated component. Such a meeting usually involves (i) a demo of geWorkbench, (ii) an overview presentation outlining the science behind the investigator's prototype, (iii) a demo of the prototype, and (iv) a brainstorming session aimed at identifying the most meaningful manner to integrate the prototype into geWorkbench so that the prototype leverages as much as possible of the functionality already present in geWorkbench.
:# '''Development of Use case document''': based on the information collected at the investigator meeting a detailed Use Case requirements document is developed which describes precisely what the user interface will look like and how users will interact with it. The Use Case document is subject to review and approval by the investigator. An example of such a document can be found at http://magnet.c2b2.columbia.edu/AnnualReport/Y2/CNKB_UC.pdf.
:# '''Development of user interface/computational service''': based on the requirements laid out in the use case document, the development team (a) develops the user interface for the integrated version of the investigator tool, and (2) ports any associated computate server to the grid-enabled server framework used by geWorkbench (which is based on the caGrid software, https://cabig.nci.nih.gov/workspaces/Architecture/caGrid, a grid middleware layer developed by the caBIG initiative). Unit tests are an integral part of the development process and are run automatically on a nightly basis.
:# '''System Testing''': After the prototype in integrated a formal integration and system testing cycle ensues. Our group uses a formal testing process based on detailed test scripts whose execution status is tracked in a custom database we have developed for this purpose. In this database, failed scripts are linked to their associated defect reports within our defect management system, http://wiki.c2b2.columbia.edu/mantis/.
:# '''Documentation''': The final step in the development process is the generation of detailed end-user documentation, in the form of on-line help, tutorials, end-user guides and training slides. Links to these materials are available at: http://wiki.c2b2.columbia.edu/workbench/index.php/Project_Documentation.

geWorkbench is an open-source platform and we encourage and welcome code contributions by the community. To facilitate such contributions we have made the entire code base freely available to everyone through our community development project site, http://gforge.nci.nih.gov/projects/geworkbench/, registered with the NCI GForge project. This site enables us to leverage the infrastructure support that GForge offers to collaborative development projects, including access to a CVS server, streamlined setup of mailing lists and user-forums, posting software releases, etc.

==License==
At present, geWorkbench in made available under the licensing terms stated at http://wiki.c2b2.columbia.edu/workbench/index.php/GeWorkbench_License. We are currently in the process of modularizing our licensing , in order to accommodate the needs of different labs/tools.

==Contact==
The software engineering group at MAGNet is managed by [mailto:%66%6C%6F%72%61%74%6F%73%40%63%32%62%32%2E%63%6F%6C%75%6D%62%69%61%2E%65%64%75 Aris Floratos].

SDIWG:Software Engineering At MAGNet

2007-05-18T22:38:07Z

Aris:

__TOC__
==Process==
The software development process in MAGNet typically takes place in 2 phases:
* Prototype tools are developed and field-tested at various investigator labs in the context of Core 1, 2 and 3 activities. These tools are usually command line versions, coded in an assortment of languages (C, C++, Perl, etc) and/or computational environments (MATLAB, R). In the majority of cases, analysis results are represented as text files that are either inspected with text editors or are transformed (through custom programs) to properly formatted inputs for downstream analysis/visualization tools.
* When prototypes have reached a satisfactory level of maturity the software engineering group takes over the task of integrating them into the Center's bioinformatics platform, geWorkbench, http://www.geworkbench.org. Operationally, each integration activity is carried out following a number of steps deriving from a standard software life cycle process:
## '''Collection of business requirements''': an initial meeting is arranged with the investigator whose prototype is slated for integration. During the meeting high level requirements are collected, outlining the intended functionality of the integrated component. Such a meeting usually involves (i) a demo of geWorkbench, (ii) an overview presentation outlining the science behind the investigator's prototype, (iii) a demo of the prototype, and (iv) a brainstorming session aimed at identifying the most meaningful manner to integrate the prototype into geWorkbench so that the prototype leverages as much as possible of the functionality already present in geWorkbench.
## '''Development of Use case document''': based on the information collected at the investigator meeting a detailed Use Case requirements document is developed which describes precisely what the user interface will look like and how users will interact with it. The Use Case document is subject to review and approval by the investigator. An example of such a document can be found at http://magnet.c2b2.columbia.edu/AnnualReport/Y2/CNKB_UC.pdf.
## '''Development of user interface/computational service''': based on the requirements laid out in the use case document, the development team (a) develops the user interface for the integrated version of the investigator tool, and (2) ports any associated computate server to the grid-enabled server framework used by geWorkbench (which is based on the caGrid software, https://cabig.nci.nih.gov/workspaces/Architecture/caGrid, a grid middleware layer developed by the caBIG initiative). Unit tests are an integral part of the development process and are run automatically on a nightly basis.
## '''System Testing''': After the prototype in integrated a formal integration and system testing cycle ensues. Our group uses a formal testing process based on detailed test scripts whose execution status is tracked in a custom database we have developed for this purpose. In this database, failed scripts are linked to their associated defect reports within our defect management system, http://wiki.c2b2.columbia.edu/mantis/.
## '''Documentation''': The final step in the development process is the generation of detailed end-user documentation, in the form of on-line help, tutorials, end-user guides and training slides. Links to these materials are available at: http://wiki.c2b2.columbia.edu/workbench/index.php/Project_Documentation.

geWorkbench is an open-source platform and we encourage and welcome code contributions by the community. To facilitate such contributions we have made the entire code base freely available to everyone through our community development project site, http://gforge.nci.nih.gov/projects/geworkbench/, registered with the NCI GForge project. This site enables us to leverage the infrastructure support that GForge offers to collaborative development projects, including access to a CVS server, streamlined setup of mailing lists and user-forums, posting software releases, etc.

==License==
At present, geWorkbench in made available under the licensing terms stated at http://wiki.c2b2.columbia.edu/workbench/index.php/GeWorkbench_License. We are currently in the process of modularizing our licensing , in order to accommodate the needs of different labs/tools.

==Contact==
The software engineering group at MAGNet is managed by [mailto:%66%6C%6F%72%61%74%6F%73%40%63%32%62%32%2E%63%6F%6C%75%6D%62%69%61%2E%65%64%75 Aris Floratos].

SDIWG:Software Engineering At MAGNet

2007-05-18T22:32:42Z

Aris:

__TOC__
==Process==
The software development process in MAGNet typically takes place in 2 phases:
* Prototype tools are developed and field-tested at various investigator labs in the context of Core 1, 2 and 3 activities. These tools are usually command line versions, coded in an assortment of languages (C, C++, Perl, etc) and/or computational environments (MATLAB, R). In the majority of cases, analysis results are represented as text files that are either inspected with text editors or are transformed (through custom programs) to properly formatted inputs for downstream analysis/visualization tools.
* When prototypes have reached a satisfactory level of maturity the software engineering group takes over the task of integrating them into the Center's bioinformatics platform, geWorkbench, http://www.geworkbench.org. Operationally, each integration activity is carried out following a number of steps deriving from a standard software life cycle process:
## '''Collection of business requirements''': an initial meeting is arranged with the investigator whose prototype is slated for integration. During the meeting high level requirements are collected, outlining the intended functionality of the integrated component. Such a meeting usually involves (i) a demo of geWorkbench, (ii) an overview presentation outlining the science behind the investigator's prototype, (iii) a demo of the prototype, and (iv) a brainstorming session aimed at identifying the most meaningful manner to integrate the prototype into geWorkbench so that the prototype leverages as much as possible of the functionality already present in geWorkbench.
## '''Development of Use case document''': based on the information collected at the investigator meeting a detailed Use Case requirements document is developed which describes precisely what the user interface will look like and how users will interact with it. The Use Case document is subject to review and approval by the investigator. An example of such a document can be found at http://magnet.c2b2.columbia.edu/AnnualReport/Y2/CNKB_UC.pdf.
## '''Development of user interface/computational service''': based on the requirements laid out in the use case document, the development team (a) develops the user interface for the integrated version of the investigator tool, and (2) ports any associated computate server to the grid-enabled server framework used by geWorkbench (which is based on the caGrid software, https://cabig.nci.nih.gov/workspaces/Architecture/caGrid, a grid middleware layer developed by the caBIG initiative). Unit tests are an integral part of the development process and are run automatically on a nightly basis.
## '''System Testing''': After the prototype in integrated a formal integration and system testing cycle ensues. Our group uses a formal testing process based on detailed test scripts whose execution status is tracked in a custom database we have developed for this purpose. In this database, failed scripts are linked to their associated defect reports within our defect management system, http://wiki.c2b2.columbia.edu/mantis/.
## '''Documentation''': The final step in the development process is the generation of detailed end-user documentation, in the form of on-line help, tutorials, end-user guides and training slides. Links to these materials are available at: http://wiki.c2b2.columbia.edu/workbench/index.php/Project_Documentation.

geWorkbench is an open-source platform and we encourage and welcome code contributions by the community. To facilitate such contributions we have made the entire code base freely available to everyone through our community development project site, http://gforge.nci.nih.gov/projects/geworkbench/, registered with the NCI GForge project. This site enables us to leverage the infrastructure support that GForge offers to collaborative development projects, including access to a CVS server, streamlined setup of mailing lists and user-forums, posting software releases, etc.

==Contact==
The software engineering group at MAGNet is managed by [mailto:%66%6C%6F%72%61%74%6F%73%40%63%32%62%32%2E%63%6F%6C%75%6D%62%69%61%2E%65%64%75 Aris Floratos].

SDIWG:NCBC Software Classification MAGNet Examples

2007-05-15T04:52:34Z

Aris:

===DelPhi===
* '''Description''': DelPhi provides numerical solutions to the Poisson-Boltzmann equation (both linear and nonlinear form) for molecules of arbitrary shape and charge distribution. The current version is fast (the best relaxation parameter is estimated at run time), accurate (calculation of the electrostatic free energy is less dependent on the resolution of the lattice) and can handle extremely high lattice dimensions. It also includes flexible features for assigning different dielectric constants to different regions of space and treating systems containing mixed salt solutions.
* '''Data Input''': DelPhi takes as input a coordinate file format of a molecule or equivalent data for geometrical objects and/or charge distributions
* '''Data Output''': electrostatic potential in and around the system
* '''Implementation Language''': Fortran and C
* '''Version, Date, Stage''': Stable public release
* '''Authors''': E.Alexov, R.Fine, M.K.Gilson, A.Nicholls, W.Rocchia, K.Sharp, and B. Honig.
* '''Platforms Tested''': Unix-SGI IRIX, linux, PC (requires Fortran and C compilers), AIX IBM version and Mac.
* '''License''': Freely available to academia; pay model for commercial users.
* '''Keywords''': Finite Difference Poisson-Boltzman Solver
* '''URL''': http://trantor.bioc.columbia.edu/delphi
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Numerical Calculation of Electrostatic Potential

===GRASP===
* '''Description''': A molecular visualization and analysis program. It is particularly useful for the display and manipulation of the surfaces of molecules and their electrostatic properties.
* '''Data Input''': PDB files, potential maps from DelPhi
* '''Data Output''': molecular graphics.
* '''Implementation Language''': Fortran
* '''Version, Date, Stage''': v1.3.6 .Stable public release.
* '''Authors''': Anthony Nicholls and Barry Honig.
* '''Platforms Tested''': SGI machines: irix 5.x and 6.x (INDYs, INDIGOs including Impact, Octane and O2) systems.
* '''License''': Freely available to academia.
* '''Keywords''': molecular visualization
* '''URL''': http://trantor.bioc.columbia.edu/grasp
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Molecular Visualization Package

===Nest===
* '''Description''': Modeling protein structure based on a sequence-template alignment. The current server works only for modeling with a single template. Part of jackal, which can be downloaded.
* '''Data Input''': pir and PDB files
* '''Data Output''':
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Xiang, Z. and Honig, B.
* '''Platforms Tested''': platform independent (web based tool)
* '''License''': Freely available to academia.
* '''Keywords''': modeling, protein structure, sequence-template alignment.
* '''URL''': http://honiglab.cpmc.columbia.edu/cgi-bin/jackal/nest.cgi
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Homology Modeling

===JACKAL===
* '''Description''': Jackal is a collection of programs designed for the modeling and analysis of protein structures. Its core program is a versatile homology modeling package nest. JACKAL has the following capabilities: 1) comparative modeling based on single, composite or multiple templates; 2) side-chain prediction; 3) modeling residue mutation, insertion or deletion; 4) loop prediction; 5) structure refinement; 6) reconstruction of protein missing atoms;7) reconstruction of protein missing residues; 8) prediction of hydrogen atoms; 9) fast calculation of solvent accessible surface area; 10) structure superimposition.
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Version: 1.5 as of Oct, 20, 2002, Stable public release.
* '''Authors''': Z. Xiang and B. Honig
* '''Platforms Tested''': SGI 6.5, Intel Linux and Sun solaris
* '''License''': Freely available to academia.
* '''Keywords''': Protein Structure Modeling
* '''URL''': http://trantor.bioc.columbia.edu/programs/jackal
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Prediction of Side-chain Conformations

===GRASP2===
* '''Description''': GRASP2 is an updated version of the GRASP program used for macromolecular structure and surface visualization, contains a large number of new features and scientific tools: Enhanced GUI; Structure alignment and domain database scanning; A gaussian surface generator and new surface coloring schemes; Sequence visualization and alignment; Completed work can be stored in "project files; Among the many objects that can be stored in a project file are views of the structure; defined subsets, surfaces; Direct printing to printers at full printer resolution.
* '''Data Input''': PDB files, potential maps from DelPhi, sequence alignments.
* '''Data Output''': molecular graphics, structural alignments.
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Stable public release
* '''Authors''': Donald Petrey and Barry Honig.
* '''Platforms Tested''': Windows, Linux
* '''License''': Freely available to academia.
* '''Keywords''': molecular visualization
* '''URL''': http://trantor.bioc.columbia.edu/grasp2
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Molecular Visualization Package

===PrISM===
* '''Description''': PrISM is an integrated computational system where computational tools are implemented for protein sequence and structure analysis and modeling.
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''': Fortran
* '''Version, Date, Stage''': Stable public release
* '''Authors''': Wang, L, Yang, A. S. & Honig, B.
* '''Platforms Tested''':SGI-irix, Intel-linux
* '''License''': Freely available to academia.
* '''Keywords''': protein analysis/modeling
* '''URL''': http://trantor.bioc.columbia.edu/programs/PrISM/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Homology Modeling

===Protein-DNA interface alignment===
* '''Description''': The protein-DNA alignment software allows one to align the interfacial amino acids from two protein-DNA complexes based on the geometric relationship of each amino acid to its local DNA.
* '''Data Input''': two PDB files that both contain protein-DNA complexes
* '''Data Output''': The programs will output the aligned residues and their corresponding residue-residue similarity scores, s(i,j).
* '''Implementation Language''': C++ and Perl
* '''Version, Date, Stage''':Stable public release.
* '''Authors''': Siggers, T.W., Silkov, A & Honig, B.
* '''Platforms Tested''': Linux
* '''License''': Freely available to academia
* '''Keywords''': protein-DNA interface
* '''URL''': http://trantor.bioc.columbia.edu/programs/intfc_aln
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Prediction of Side-chain Conformations

===SURFace===
* '''Description''': SURFace algorithms are programs that calculate solvent accessible surface area and curvature corrected solvent accessible surface area
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''':
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Nicholls, A., Sharp, K., Sridharan, S. and Honig, B.
* '''Platforms Tested''': SGI
* '''License''': Freely available to academia.
* '''Keywords''': solvent accessible surface area
* '''URL''': http://trantor.bioc.columbia.edu/surf/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Caculation of Solvent Accessible Area

===Target Explorer===
* '''Description''': Automated process of prediction of complex regulatory elements for specified set of transcription factors in Drosophila melanogaster genome. Target Explorer is a complex tool with user-friendly self-explanatory Web-interface that allows to user: 1. create customized library of TF binding site matrices based on user defined sets of training sequences; 2. search for new clusters of binding sites for specified set of TFs; 3.extract annotation for potential target genes.
* '''Data Input''': genomic sequences
* '''Data Output''': clusters of known binding sites
* '''Implementation Language''': perl, cgi
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Sosinsky A, Bonin CP, Mann RS, Honig B.
* '''Platforms Tested''': platform independent (web based tool)
* '''License''': Freely available to academia.
* '''Keywords''': prediction of binding sites for transcription factors
* '''URL''': http://trantor.bioc.columbia.edu/Target_Explorer/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis--> Sequence Annotation

===MEDUSA and Gorgon===
* '''Description''': MEDUSA is an algorithm for learning predictive models of transcriptional gene regulation from gene expression and promoter sequence data. By using a statistical learning approach based on boosting, MEDUSA learns cis regulatory motifs, condition-specific regulators, and regulatory programs that predict the differential expression of target genes. The regulatory program is specified as an alternating decision tree (ADT). The Java implementation of MEDUSA allows a number of visualizations of the regulatory program and other inferred regulatory information, implemented in the accompanying Gorgon tool, including hits of significant and condition-specific motifs along the promoter sequences of target genes and regulatory network figures viewable in Cytoscape.
* '''Data Input''': Discretized (up/down/baseline) gene expression data in plain text format, promoter sequences in FASTA format, list of candidate transcriptional regulators and signal transducers in plain text format.
* '''Data Output''': Regulatory program represented as a Java serialized object file readable by Gorgon and as a human readable XML file. Gorgon currently generates views of learned PSSMs, positional hits along promoter sequences, and views of the ADT as HTML files, and generates network figures as Cytoscape format files.
* '''Implementation Language''': Java (prototyped in MATLAB)
* '''Version, Date, Stage''': Version 2.0, July 2006, pre-release beta version; Version 1.0 (MATLAB), April 2005, stable public release
* '''Authors''': David Quigley, Manuel Middendorf, Steve Lianoglou, Anshul Kundaje, Yoav Freund, Chris Wiggins, Christina Leslie
* '''Platforms Tested''': Windows, Linux, Mac OS X
* '''License''': Open source
* '''Keywords''':
* '''URL''': http://www.cs.columbia.edu/compbio/medusa (MATLAB),http://compbio.sytes.net:8090/medusa (Java beta version)
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Regulatory/Signaling network reconstruction

===String kernel package===
* '''Description''':The string kernel package contains implementations for the mismatch and profile string kernels for use with support vector machine (SVM) classifiers for protein sequence classification. Both kernels compute similarity between protein sequences based on common occurrences of k-length subsequences ("k-mers") counted with substitutions. Kernel functions for protein sequence data enable the training of SVMs for a range of prediction problems, in particular protein structural class prediction and remote homology detection. A version of the Spider MATLAB machine learning package is also bundled with the code, which allows users to train SVMs and evaluate performance on test sets with the packaged software.
* '''Data Input''': The mismatch kernel requires sequence data in FASTA format. The profile string kernel uses probabilistic profiles, such as those produced by PSI-BLAST, in place of the original sequences. The Spider SVM implementation requires both the kernel matrix and a label file of binary or multi-class labels for the training data; this data must be loaded into MATLAB variables before using Spider routing.
* '''Data Output''':The kernel code produces a kernel matrix for the input data in tab-delimited text format. The Spider package trains SVMs and stores the learns classifier and results from applying the classifier on test data as MATLAB objects.
* '''Implementation Language''': String kernel code is implemented in C. Spider is a set of object-oriented MATLAB routines.
* '''Version, Date, Stage''': Version 1.2, September 2004, stable public release
* '''Authors''': Eleazar Eskin, Rui Kuang, Eugene Ie, Ke Wang, Jason Weston, Bill Noble, Christina Leslie
* '''Platforms Tested''': Windows, Linux
* '''License''': Open source
* '''Keywords''':
* '''URL''': http://www.cs.columbia.edu/compbio/string-kernels
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification

===MatrixREDUCE===
* '''Description''': Regulation of gene expression by a transcription factor requires physical interaction between the factor and the DNA, which can be described by astatistical mechanical model. Based on this model, the MatrixREDUCE algorithm uses genome-wide occupancy data for a transcription factor (e.g.ChIP-chip or mRNA expression data) and associated nucleotide sequences to discover the sequence-specific binding affinity of the transcription factor. The sequence specificity of the transcription factor's DNA-binding domain is modeled using a position-specific affinity matrix (PSAM), representing the change in the binding affinity (Kd) whenever a specific position within a reference binding sequence is mutated. The PSAM can be transformed into affinity logo for visualization using the utility program AffinityLogo, and a MatrixREDUCE run can be summarized in an easy-to-navigate webpage using HTMLSummary.
* '''Data Input''': sequence file in FASTA format; and expression data file in tab-delimited text format.
* '''Data Output''': PSAMs in numeric and graphical format, parameters of the fitted model, and an HTML summary page.
* '''Implementation Language''': ANSI C, making use of Numerical Recipes routines.
* '''Version, Date, Stage''': Version 1.0, July 10, 2006, extensively tested in lab.
* '''Authors''': Barrett Foat, Xiang-Jun Lu, Harmen J. Bussemaker
* '''Platforms Tested''': Linux, Cygwin (Windows), Mac OS X
* '''License''':
* '''Keywords''': position-specific affinity matrix, binding affinity, cis-regulatory element, expression data, ChIP-chip, transcription factor
* '''URL''': http://www.bussemakerlab.org/software/MatrixREDUCE
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Regulatory/Signaling network reconstruction

===T-profiler===
* '''Description''': T-profiler is a web-based tool that uses the t-test to score changes in the average activity of pre-defined groups of genes. The gene groups are defined based on Gene Ontology categorization, ChIP-chip experiments, upstream matches to a consensus transcription factor binding motif, and location on the same chromosome, respectively. If desired, an iterative procedure can be used to select a single, optimal representative from sets of overlapping gene groups. A jack-knife procedure is used to make calculations more robust against outliers. T-profiler makes it possible to interpret microarray data in a way that is both intuitive and statistically rigorous, without the need to combine experiments or choose parameters.
* '''Data Input''': Currently, gene expression data from Saccharomyces cerevisiae and Candida albicans are supported.
* '''Data Output''':
* '''Implementation Language''': T-profiler is written in PHP, data is managed by a MYSQL database server
* '''Version, Date, Stage''':
* '''Authors''': André Boorsma, Barrett C. Foat, Daniel Vis, Frans Klis, Harmen J. Bussemaker
* '''Platforms Tested''': Web-based application
* '''License''':
* '''Keywords''': gene expression, transcriptome, ChIP-chip, Gene Ontology
* '''URL''': http://www.t-profiler.org
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Network characterization

===TranscriptionDetector===
* '''Description''':A tool for finding probes measuring significantly expressed loci in a genomic array experiment. Given expression data from some tiling array experiment, TranscriptionDetector decides the likelihood that a probe is detecting transcription from the locus in which it resides. Probabilities are assigned by making use of a background signal intensity distribution from a set of negative control probes. This tool is useful for the functional annotation of genomes as it allows for the discovery of novel transcriptional units independently of any genomic annotation.
* '''Data Input''': Expression data (GEO or other platforms) and designation of which probes represent negative controls and which are data probes.
* '''Data Output''': A text file with a list of probes corresponding to significantly expressed loci.
* '''Implementation Language''': ANSI C, making use of GSL.
* '''Version, Date, Stage''':
* '''Authors''': Xiang-Jun Lu, Gabor Halasz, Marinus F. van Batenburg
* '''Platforms Tested''': Linux, Cygwin (Windows), Mac OS X
* '''License''':
* '''Keywords''': tiling arrays, expression, transcriptome
* '''URL''': http://www.bussemakerlab.org/software/TranscriptionDetector/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===PhenoGO===
* '''Description''': PhenoGO adds phenotypic contextual information to existing associations between gene products and Gene Ontology (GO) terms as specified in GO Annotations (GOA). PhenoGO utilizes an existing Natural Language Processing (NLP) system, called BioMedLEE, an existing knowledge-based phenotype organizer system (PhenOS) in conjunction with MeSH indexing and established biomedical ontologies. The system also encodes the context to identifiers that are associated in different biomedical ontologies, including the UMLS, Cell Ontology, Mouse Anatomy, NCBI taxonomy, GO, and Mammalian Phenotype Ontology. In addition, PhenoGO was evaluated for coding of anatomical and cellular information and assigning the coded phenotypes to the correct GOA; results obtained show that PhenoGO has a precision of 91% and recall of 92%, demonstrating that the PhenoGO NLP system can accurately encode a large number of anatomical and cellular ontologies to GO annotations. The PhenoGO Database may be accessed at www.phenogo.org.
* '''Data Input''': Gene Ontology Annotations Files and Medline Abstracts
* '''Data Output''': XML file and www.phenogo.org Web Portal
* '''Implementation Language''': A variety of modules, the web portal is in Java and MySQL, the computational terminology component (phenOS) is written in Perl scripts that queries tables in IBM DB2, the natural language processing component is written in PROLOG.
* '''Version, Date, Stage''': Version 2, Feb 2006
* '''Authors''': Yves Lussier and Carol Friedman are the principal investigators. The programmers are Jianrong Li, Lee Sam, and Tara Borlawsky
* '''Platforms Tested''': n/a
* '''License''': n/a
* '''Keywords''': Phenotypic integration, computational phenotypes
* '''URL''': http://www.phenogo.org
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Biotool --> Data Management --> Information retrieval, traversal and querying; Atomic --> SoftwareFunction --> Natural Language Processing

===MINDY===
* '''Description''': Given a transcription factor of interest, MINDY uses a large set of gene expression profile data to identify potential post-transcriptional modulators of the transcription factor's activity. MINDY is based on a three-way statistical interaction model that captures the post-transcriptional regulatory event where the ability of a transcription factor to activate/repress its target genes is monotonically controlled by a potential modulator gene.
* '''Data Input''': Gene expression data in the EXP format, and a user-specified transcription factor of interest
* '''Data Output''': Lists of the putative modulators and target genes of the transcription factor, and the modulatory interactions involving them
* '''Implementation Language''': C++ and MATLAB, Java
* '''Version, Date, Stage''': Stable release, April 2007
* '''Authors''': Kai Wang, Ilya Nemenman, Adam Margolin, Riccardo Dalla-Favera, Andrea Califano
* '''Platforms Tested''': Linux, Cygwin
* '''License''': n/a
* '''Keywords''': gene expression, transcriptional interaction, modulator
* '''URL''': n/a
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Regulatory/Signaling network reconstruction

===B Cell Interactome===
* '''Description''': The B cell interactome (BCI) is a network of protein-protein, protein-DNA and modulatory interactions in human B cells. The network contains known interactions (reported in public databases) and predicted interactions by a Bayesian evidence integration framework which integrates a variety of generic and context specific experimental clues about protein-protein and protein-DNA interactions - such as a large collection of B cell expression profiles - with inferences from different reverse engineering algorithms, such as GeneWays and ARACNE. Modulatory interactions are predicted by MINDY, an algorithm for the prediction of modulators of transcriptional interactions.
* '''Data Input''': n/a
* '''Data Output''': text file of binary interations associated with a probability.
* '''Implementation Language''': Perl
* '''Version, Date, Stage''': Version 2, March 2007
* '''Authors''': Lefebvre C, Lim WK, Basso K, Dalla Favera R, and Califano A.
* '''Platforms Tested''': n/a
* '''License''':
* '''Keywords''': Naive Bayes, Mixed-Interaction Network, human B cells.
* '''URL''': http://amdec-bioinfo.cu-genome.org/html/BCellInteractome.html
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Interaction Modeling

===ARACNE===
* '''Description''': ARACNE is an algorithm for inferring gene regulatory networks from a set of microarray experiments. The method uses mutual information to identify genes that are co-expressed and then applies the data processing inequality to filter out interactions that are likely to be indirect.
* '''Data Input''': Text file containing measurements from a set of microarray experiments.
* '''Data Output''': Text file containing predicted interactions.
* '''Implementation Language''': C++, Java
* '''Version, Date, Stage''': Version 1, June, 2006
* '''Authors''': Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A.
* '''Platforms Tested''': Window, Linux
* '''License''': Open source
* '''Keywords''': Reverse engineering, mutual information, genetic networks, microarray
* '''URL''': http://amdec-bioinfo.cu-genome.org/html/ARACNE.htm
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Regulatory/Signaling network reconstruction

===geWorkbench===
* '''Description''': geWorkbench is a Java application that provides users with an integrated suite of genomics tools. It is built on an open-source, extensible architecture that promotes interoperability and simplifies the development of new as well as the incorporation of pre-existing components. The resulting system provides seamless access to a multitude of both local and remote data and computational services through an integrated environment that offers a unified user experience. Over 50 data analysis and visualization components have been developed for the framework, covering a wide range of genomics domains including gene expression, sequence, structure and network data.
* '''Data Input''': Gene epxression data (Affy, GenPix, RMA), Sequence (FASTA), Structure (PDB).
* '''Data Output''': Analysis results (multiple formats).
* '''Implementation Language''': Java
* '''Version, Date, Stage''': 1.0.5, 3/23/07, stable production release
* '''Authors''': A. Califano, A. Floratos. M. Kustagi, K. Smith, J. Watkinson, M. Hall, K. Keshav, X. Zhang, K. Kushal, B. Jagla, E. Daly, M. VanGinhoven, P. Morozov.
* '''Platforms Tested''': Windows XP, Linux, Mac OS 10.x.
* '''License''': Free.
* '''Keywords''': Analysis suite, gene expression analysis, sequence analysis, network reconstruction, structure predcition, visualization.
* '''URL''': http://www.geworkbench.org
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis; Atomic --> SoftwareFunction --> Interaction Modeling; Atomic --> SoftwareFunction --> Protein Modeling and Classification; Atomic --> SoftwareFunction --> Software Engineering and Development Tool --> Integration --> Resource Integration Components; Atomic --> SoftwareFunction --> Software Engineering and Development Tool --> Integration --> Grid Computing Resources; Atomic --> SoftwareFunction --> Visualization

SDIWG:NCBC Software Classification MAGNet Examples

2007-05-13T22:25:54Z

Aris: /* GRASP2 */

===DelPhi===
* '''Description''': DelPhi provides numerical solutions to the Poisson-Boltzmann equation (both linear and nonlinear form) for molecules of arbitrary shape and charge distribution. The current version is fast (the best relaxation parameter is estimated at run time), accurate (calculation of the electrostatic free energy is less dependent on the resolution of the lattice) and can handle extremely high lattice dimensions. It also includes flexible features for assigning different dielectric constants to different regions of space and treating systems containing mixed salt solutions.
* '''Data Input''': DelPhi takes as input a coordinate file format of a molecule or equivalent data for geometrical objects and/or charge distributions
* '''Data Output''': electrostatic potential in and around the system
* '''Implementation Language''': Fortran and C
* '''Version, Date, Stage''': Stable public release
* '''Authors''': E.Alexov, R.Fine, M.K.Gilson, A.Nicholls, W.Rocchia, K.Sharp, and B. Honig.
* '''Platforms Tested''': Unix-SGI IRIX, linux, PC (requires Fortran and C compilers), AIX IBM version and Mac.
* '''License''': Freely available to academia; pay model for commercial users.
* '''Keywords''': Finite Difference Poisson-Boltzman Solver
* '''URL''': http://trantor.bioc.columbia.edu/delphi
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Numerical Calculation of Electrostatic Potential

===GRASP===
* '''Description''': A molecular visualization and analysis program. It is particularly useful for the display and manipulation of the surfaces of molecules and their electrostatic properties.
* '''Data Input''': PDB files, potential maps from DelPhi
* '''Data Output''': molecular graphics.
* '''Implementation Language''': Fortran
* '''Version, Date, Stage''': v1.3.6 .Stable public release.
* '''Authors''': Anthony Nicholls and Barry Honig.
* '''Platforms Tested''': SGI machines: irix 5.x and 6.x (INDYs, INDIGOs including Impact, Octane and O2) systems.
* '''License''': Freely available to academia.
* '''Keywords''': molecular visualization
* '''URL''': http://trantor.bioc.columbia.edu/grasp
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Molecular Visualization Package

===Nest===
* '''Description''': Modeling protein structure based on a sequence-template alignment. The current server works only for modeling with a single template. Part of jackal, which can be downloaded.
* '''Data Input''': pir and PDB files
* '''Data Output''':
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Xiang, Z. and Honig, B.
* '''Platforms Tested''': platform independent (web based tool)
* '''License''': Freely available to academia.
* '''Keywords''': modeling, protein structure, sequence-template alignment.
* '''URL''': http://honiglab.cpmc.columbia.edu/cgi-bin/jackal/nest.cgi
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Homology Modeling

===JACKAL===
* '''Description''': Jackal is a collection of programs designed for the modeling and analysis of protein structures. Its core program is a versatile homology modeling package nest. JACKAL has the following capabilities: 1) comparative modeling based on single, composite or multiple templates; 2) side-chain prediction; 3) modeling residue mutation, insertion or deletion; 4) loop prediction; 5) structure refinement; 6) reconstruction of protein missing atoms;7) reconstruction of protein missing residues; 8) prediction of hydrogen atoms; 9) fast calculation of solvent accessible surface area; 10) structure superimposition.
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Version: 1.5 as of Oct, 20, 2002, Stable public release.
* '''Authors''': Z. Xiang and B. Honig
* '''Platforms Tested''': SGI 6.5, Intel Linux and Sun solaris
* '''License''': Freely available to academia.
* '''Keywords''': Protein Structure Modeling
* '''URL''': http://trantor.bioc.columbia.edu/programs/jackal
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Prediction of Side-chain Conformations

===GRASP2===
* '''Description''': GRASP2 is an updated version of the GRASP program used for macromolecular structure and surface visualization, contains a large number of new features and scientific tools: Enhanced GUI; Structure alignment and domain database scanning; A gaussian surface generator and new surface coloring schemes; Sequence visualization and alignment; Completed work can be stored in "project files; Among the many objects that can be stored in a project file are views of the structure; defined subsets, surfaces; Direct printing to printers at full printer resolution.
* '''Data Input''': PDB files, potential maps from DelPhi, sequence alignments.
* '''Data Output''': molecular graphics, structural alignments.
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Stable public release
* '''Authors''': Donald Petrey and Barry Honig.
* '''Platforms Tested''': Windows, Linux
* '''License''': Freely available to academia.
* '''Keywords''': molecular visualization
* '''URL''': http://trantor.bioc.columbia.edu/grasp2
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Molecular Visualization Package

===PrISM===
* '''Description''': PrISM is an integrated computational system where computational tools are implemented for protein sequence and structure analysis and modeling.
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''': Fortran
* '''Version, Date, Stage''': Stable public release
* '''Authors''': Wang, L, Yang, A. S. & Honig, B.
* '''Platforms Tested''':SGI-irix, Intel-linux
* '''License''': Freely available to academia.
* '''Keywords''': protein analysis/modeling
* '''URL''': http://trantor.bioc.columbia.edu/programs/PrISM/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Homology Modeling

===Protein-DNA interface alignment===
* '''Description''': The protein-DNA alignment software allows one to align the interfacial amino acids from two protein-DNA complexes based on the geometric relationship of each amino acid to its local DNA.
* '''Data Input''': two PDB files that both contain protein-DNA complexes
* '''Data Output''': The programs will output the aligned residues and their corresponding residue-residue similarity scores, s(i,j).
* '''Implementation Language''': C++ and Perl
* '''Version, Date, Stage''':Stable public release.
* '''Authors''': Siggers, T.W., Silkov, A & Honig, B.
* '''Platforms Tested''': Linux
* '''License''': Freely available to academia
* '''Keywords''': protein-DNA interface
* '''URL''': http://trantor.bioc.columbia.edu/programs/intfc_aln
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Prediction of Side-chain Conformations

===SURFace===
* '''Description''': SURFace algorithms are programs that calculate solvent accessible surface area and curvature corrected solvent accessible surface area
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''':
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Nicholls, A., Sharp, K., Sridharan, S. and Honig, B.
* '''Platforms Tested''': SGI
* '''License''': Freely available to academia.
* '''Keywords''': solvent accessible surface area
* '''URL''': http://trantor.bioc.columbia.edu/surf/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Caculation of Solvent Accessible Area

===Target Explorer===
* '''Description''': Automated process of prediction of complex regulatory elements for specified set of transcription factors in Drosophila melanogaster genome. Target Explorer is a complex tool with user-friendly self-explanatory Web-interface that allows to user: 1. create customized library of TF binding site matrices based on user defined sets of training sequences; 2. search for new clusters of binding sites for specified set of TFs; 3.extract annotation for potential target genes.
* '''Data Input''': genomic sequences
* '''Data Output''': clusters of known binding sites
* '''Implementation Language''': perl, cgi
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Sosinsky A, Bonin CP, Mann RS, Honig B.
* '''Platforms Tested''': platform independent (web based tool)
* '''License''': Freely available to academia.
* '''Keywords''': prediction of binding sites for transcription factors
* '''URL''': http://trantor.bioc.columbia.edu/Target_Explorer/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis--> Sequence Annotation

===MEDUSA and Gorgon===
* '''Description''': MEDUSA is an algorithm for learning predictive models of transcriptional gene regulation from gene expression and promoter sequence data. By using a statistical learning approach based on boosing, MEDUSA learns cis regulatory motifs, condition-specific regulators, and regulatory programs that predict the differential expression of target genes. The regulatory program is specified as an alternating decision tree (ADT). The Java implementation of MEDUSA will allow a number of visualizations of the regulatory program and other inferred regulatory information, implemented in the accompanying Gorgon tool, including hits of significant and condition-specific motifs along the promoter sequences of target genes and regulatory network figures viewable in Cytoscape.
* '''Data Input''': Discretized (up/down/baseline) gene expression data in plain text format, promoter sequences in FASTA format, list of candidate transcriptional regulators and signal transducers in plain text format.
* '''Data Output''': Regulatory program represented as a Java serialized object file readable by Gorgon and as a human readable XML file. Gorgon currently generates views of learned PSSMs, positional hits along promoter sequences, and views of the ADT as HTML files, and generates network figures as Cytoscape format files.
* '''Implementation Language''': Java (prototyped in MATLAB)
* '''Version, Date, Stage''': Version 2.0, July 2006, pre-release beta version; Version 1.0 (MATLAB), April 2005, stable public release
* '''Authors''': David Quigley, Manuel Middendorf, Steve Lianoglou, Anshul Kundaje, Yoav Freund, Chris Wiggins, Christina Leslie
* '''Platforms Tested''': Windows, Linux, Mac OS X
* '''License''': Open source
* '''Keywords''':
* '''URL''': http://www.cs.columbia.edu/compbio/medusa (MATLAB),http://compbio.sytes.net:8090/medusa (Java beta version)
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===String kernel package===
* '''Description''':The string kernel package contains implementations for the mismatch and profile string kernels for use with support vector machine (SVM) classifiers for protein sequence classification. Both kernels compute similarity between protein sequences based on common occurrences of k-length subsequences ("k-mers") counted with substitutions. Kernel functions for protein sequence data enable the training of SVMs for a range of prediction problems, in particular protein structural class prediction and remote homology detection. A version of the Spider MATLAB machine learning package is also bundled with the code, which allows users to train SVMs and evaluate performance on test sets with the packaged software.
* '''Data Input''': The mismatch kernel requires sequence data in FASTA format. The profile string kernel uses probabilistic profiles, such as those produced by PSI-BLAST, in place of the original sequences. The Spider SVM implementation requires both the kernel matrix and a label file of binary or multi-class labels for the training data; this data must be loaded into MATLAB variables before using Spider routing.
* '''Data Output''':The kernel code produces a kernel matrix for the input data in tab-delimited text format. The Spider package trains SVMs and stores the learns classifier and results from applying the classifier on test data as MATLAB objects.
* '''Implementation Language''': String kernel code is implemented in C. Spider is a set of object-oriented MATLAB routines.
* '''Version, Date, Stage''': Version 1.2, September 2004, stable public release
* '''Authors''': Eleazar Eskin, Rui Kuang, Eugene Ie, Ke Wang, Jason Weston, Bill Noble, Christina Leslie
* '''Platforms Tested''': Windows, Linux
* '''License''': Open source
* '''Keywords''':
* '''URL''': http://www.cs.columbia.edu/compbio/string-kernels
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===MatrixREDUCE===
* '''Description''': Regulation of gene expression by a transcription factor requires physical interaction between the factor and the DNA, which can be described by astatistical mechanical model. Based on this model, the MatrixREDUCE algorithm uses genome-wide occupancy data for a transcription factor (e.g.ChIP-chip or mRNA expression data) and associated nucleotide sequences to discover the sequence-specific binding affinity of the transcription factor. The sequence specificity of the transcription factor's DNA-binding domain is modeled using a position-specific affinity matrix (PSAM), representing the change in the binding affinity (Kd) whenever a specific position within a reference binding sequence is mutated. The PSAM can be transformed into affinity logo for visualization using the utility program AffinityLogo, and a MatrixREDUCE run can be summarized in an easy-to-navigate webpage using HTMLSummary.
* '''Data Input''': sequence file in FASTA format; and expression data file in tab-delimited text format.
* '''Data Output''': PSAMs in numeric and graphical format, parameters of the fitted model, and an HTML summary page.
* '''Implementation Language''': ANSI C, making use of Numerical Recipes routines.
* '''Version, Date, Stage''': Version 1.0, July 10, 2006, extensively tested in lab.
* '''Authors''': Barrett Foat, Xiang-Jun Lu, Harmen J. Bussemaker
* '''Platforms Tested''': Linux, Cygwin (Windows), Mac OS X
* '''License''':
* '''Keywords''': position-specific affinity matrix, binding affinity, cis-regulatory element, expression data, ChIP-chip, transcription factor
* '''URL''': http://www.bussemakerlab.org/software/MatrixREDUCE
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===T-profiler===
* '''Description''': T-profiler is a web-based tool that uses the t-test to score changes in the average activity of pre-defined groups of genes. The gene groups are defined based on Gene Ontology categorization, ChIP-chip experiments, upstream matches to a consensus transcription factor binding motif, and location on the same chromosome, respectively. If desired, an iterative procedure can be used to select a single, optimal representative from sets of overlapping gene groups. A jack-knife procedure is used to make calculations more robust against outliers. T-profiler makes it possible to interpret microarray data in a way that is both intuitive and statistically rigorous, without the need to combine experiments or choose parameters.
* '''Data Input''': Currently, gene expression data from Saccharomyces cerevisiae and Candida albicans are supported.
* '''Data Output''':
* '''Implementation Language''': T-profiler is written in PHP, data is managed by a MYSQL database server
* '''Version, Date, Stage''':
* '''Authors''': André Boorsma, Barrett C. Foat, Daniel Vis, Frans Klis, Harmen J. Bussemaker
* '''Platforms Tested''': Web-based application
* '''License''':
* '''Keywords''': gene expression, transcriptome, ChIP-chip, Gene Ontology
* '''URL''': http://www.t-profiler.org
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===TranscriptionDetector===
* '''Description''':A tool for finding probes measuring significantly expressed loci in a genomic array experiment. Given expression data from some tiling array experiment, TranscriptionDetector decides the likelihood that a probe is detecting transcription from the locus in which it resides. Probabilities are assigned by making use of a background signal intensity distribution from a set of negative control probes. This tool is useful for the functional annotation of genomes as it allows for the discovery of novel transcriptional units independently of any genomic annotation.
* '''Data Input''': Expression data (GEO or other platforms) and designation of which probes represent negative controls and which are data probes.
* '''Data Output''': A text file with a list of probes corresponding to significantly expressed loci.
* '''Implementation Language''': ANSI C, making use of GSL.
* '''Version, Date, Stage''':
* '''Authors''': Xiang-Jun Lu, Gabor Halasz, Marinus F. van Batenburg
* '''Platforms Tested''': Linux, Cygwin (Windows), Mac OS X
* '''License''':
* '''Keywords''': tiling arrays, expression, transcriptome
* '''URL''': http://www.bussemakerlab.org/software/TranscriptionDetector/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===PhenoGO===
* '''Description''': PhenoGO adds phenotypic contextual information to existing associations between gene products and Gene Ontology (GO) terms as specified in GO Annotations (GOA). PhenoGO utilizes an existing Natural Language Processing (NLP) system, called BioMedLEE, an existing knowledge-based phenotype organizer system (PhenOS) in conjunction with MeSH indexing and established biomedical ontologies. The system also encodes the context to identifiers that are associated in different biomedical ontologies, including the UMLS, Cell Ontology, Mouse Anatomy, NCBI taxonomy, GO, and Mammalian Phenotype Ontology. In addition, PhenoGO was evaluated for coding of anatomical and cellular information and assigning the coded phenotypes to the correct GOA; results obtained show that PhenoGO has a precision of 91% and recall of 92%, demonstrating that the PhenoGO NLP system can accurately encode a large number of anatomical and cellular ontologies to GO annotations. The PhenoGO Database may be accessed at www.phenogo.org.
* '''Data Input''': Gene Ontology Annotations Files and Medline Abstracts
* '''Data Output''': XML file and www.phenogo.org Web Portal
* '''Implementation Language''': A variety of modules, the web portal is in Java and MySQL, the computational terminology component (phenOS) is written in Perl scripts that queries tables in IBM DB2, the natural language processing component is written in PROLOG.
* '''Version, Date, Stage''': Version 2, Feb 2006
* '''Authors''': Yves Lussier and Carol Friedman are the principal investigators. The programmers are Jianrong Li, Lee Sam, and Tara Borlawsky
* '''Platforms Tested''': n/a
* '''License''': n/a
* '''Keywords''': Phenotypic integration, computational phenotypes
* '''URL''': www.phenogo.org
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===MINDY===
* '''Description''': Given a transcription factor of interest, MINDY uses a large set of gene expression profile data to identify potential post-transcriptional modulators of the transcription factor's activity. MINDY is based on a three-way statistical interaction model that captures the post-transcriptional regulatory event where the ability of a transcription factor to activate/repress its target genes is monotonically controlled by a potential modulator gene.
* '''Data Input''': Gene expression data in the EXP format, and a user-specified transcription factor of interest
* '''Data Output''': Lists of the putative modulators and target genes of the transcription factor, and the modulatory interactions involving them
* '''Implementation Language''': C++ and MATLAB, Java
* '''Version, Date, Stage''': Stable release, April 2007
* '''Authors''': Kai Wang, Ilya Nemenman, Adam Margolin, Riccardo Dalla-Favera, Andrea Califano
* '''Platforms Tested''': Linux, Cygwin
* '''License''': n/a
* '''Keywords''': gene expression, transcriptional interaction, modulator
* '''URL''': n/a
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Regulatory/Signaling network reconstruction

===B Cell Interactome===
* '''Description''': The B cell interactome (BCI) is a network of protein-protein, protein-DNA and modulatory interactions in human B cells. The network contains known interactions (reported in public databases) and predicted interactions by a Bayesian evidence integration framework which integrates a variety of generic and context specific experimental clues about protein-protein and protein-DNA interactions - such as a large collection of B cell expression profiles - with inferences from different reverse engineering algorithms, such as GeneWays and ARACNE. Modulatory interactions are predicted by MINDY, an algorithm for the prediction of modulators of transcriptional interactions.
* '''Data Input''': n/a
* '''Data Output''': text file of binary interations associated with a probability.
* '''Implementation Language''': Perl
* '''Version, Date, Stage''': Version 2, March 2007
* '''Authors''': Lefebvre C, Lim WK, Basso K, Dalla Favera R, and Califano A.
* '''Platforms Tested''': n/a
* '''License''':
* '''Keywords''': Naive Bayes, Mixed-Interaction Network, human B cells.
* '''URL''': http://amdec-bioinfo.cu-genome.org/html/BCellInteractome.html
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Interaction Modeling

===ARACNE===
* '''Description''': ARACNE is an algorithm for inferring gene regulatory networks from a set of microarray experiments. The method uses mutual information to identify genes that are co-expressed and then applies the data processing inequality to filter out interactions that are likely to be indirect.
* '''Data Input''': Text file containing measurements from a set of microarray experiments.
* '''Data Output''': Text file containing predicted interactions.
* '''Implementation Language''': C++, Java
* '''Version, Date, Stage''': Version 1, June, 2006
* '''Authors''': Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A.
* '''Platforms Tested''': Window, Linux
* '''License''': Open source
* '''Keywords''': Reverse engineering, mutual information, genetic networks, microarray
* '''URL''': http://amdec-bioinfo.cu-genome.org/html/ARACNE.htm
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Regulatory/Signaling network reconstruction

===geWorkbench===
* '''Description''': geWorkbench is a Java application that provides users with an integrated suite of genomics tools. It is built on an open-source, extensible architecture that promotes interoperability and simplifies the development of new as well as the incorporation of pre-existing components. The resulting system provides seamless access to a multitude of both local and remote data and computational services through an integrated environment that offers a unified user experience. Over 50 data analysis and visualization components have been developed for the framework, covering a wide range of genomics domains including gene expression, sequence, structure and network data.
* '''Data Input''': Gene epxression data (Affy, GenPix, RMA), Sequence (FASTA), Structure (PDB).
* '''Data Output''': Analysis results (multiple formats).
* '''Implementation Language''': Java
* '''Version, Date, Stage''': 1.0.5, 3/23/07, stable production release
* '''Authors''': A. Califano, A. Floratos. M. Kustagi, K. Smith, J. Watkinson, M. Hall, K. Keshav, X. Zhang, K. Kushal, B. Jagla, E. Daly, M. VanGinhoven, P. Morozov.
* '''Platforms Tested''': Windows XP, Linux, Mac OS 10.x.
* '''License''': Free.
* '''Keywords''': Analysis suite, gene expression analysis, sequence analysis, network reconstruction, structure predcition, visualization.
* '''URL''': http://www.geworkbench.org
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis; Atomic --> SoftwareFunction --> Interaction Modeling; Atomic --> SoftwareFunction --> Protein Modeling and Classification; Atomic --> SoftwareFunction --> Software Engineering and Development Tool --> Integration --> Resource Integration Components; Atomic --> SoftwareFunction --> Software Engineering and Development Tool --> Integration --> Grid Computing Resources; Atomic --> SoftwareFunction --> Visualization

SDIWG:NCBC Software Classification MAGNet Examples

2007-05-13T22:23:35Z

Aris:

===DelPhi===
* '''Description''': DelPhi provides numerical solutions to the Poisson-Boltzmann equation (both linear and nonlinear form) for molecules of arbitrary shape and charge distribution. The current version is fast (the best relaxation parameter is estimated at run time), accurate (calculation of the electrostatic free energy is less dependent on the resolution of the lattice) and can handle extremely high lattice dimensions. It also includes flexible features for assigning different dielectric constants to different regions of space and treating systems containing mixed salt solutions.
* '''Data Input''': DelPhi takes as input a coordinate file format of a molecule or equivalent data for geometrical objects and/or charge distributions
* '''Data Output''': electrostatic potential in and around the system
* '''Implementation Language''': Fortran and C
* '''Version, Date, Stage''': Stable public release
* '''Authors''': E.Alexov, R.Fine, M.K.Gilson, A.Nicholls, W.Rocchia, K.Sharp, and B. Honig.
* '''Platforms Tested''': Unix-SGI IRIX, linux, PC (requires Fortran and C compilers), AIX IBM version and Mac.
* '''License''': Freely available to academia; pay model for commercial users.
* '''Keywords''': Finite Difference Poisson-Boltzman Solver
* '''URL''': http://trantor.bioc.columbia.edu/delphi
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Numerical Calculation of Electrostatic Potential

===GRASP===
* '''Description''': A molecular visualization and analysis program. It is particularly useful for the display and manipulation of the surfaces of molecules and their electrostatic properties.
* '''Data Input''': PDB files, potential maps from DelPhi
* '''Data Output''': molecular graphics.
* '''Implementation Language''': Fortran
* '''Version, Date, Stage''': v1.3.6 .Stable public release.
* '''Authors''': Anthony Nicholls and Barry Honig.
* '''Platforms Tested''': SGI machines: irix 5.x and 6.x (INDYs, INDIGOs including Impact, Octane and O2) systems.
* '''License''': Freely available to academia.
* '''Keywords''': molecular visualization
* '''URL''': http://trantor.bioc.columbia.edu/grasp
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Molecular Visualization Package

===Nest===
* '''Description''': Modeling protein structure based on a sequence-template alignment. The current server works only for modeling with a single template. Part of jackal, which can be downloaded.
* '''Data Input''': pir and PDB files
* '''Data Output''':
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Xiang, Z. and Honig, B.
* '''Platforms Tested''': platform independent (web based tool)
* '''License''': Freely available to academia.
* '''Keywords''': modeling, protein structure, sequence-template alignment.
* '''URL''': http://honiglab.cpmc.columbia.edu/cgi-bin/jackal/nest.cgi
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Homology Modeling

===JACKAL===
* '''Description''': Jackal is a collection of programs designed for the modeling and analysis of protein structures. Its core program is a versatile homology modeling package nest. JACKAL has the following capabilities: 1) comparative modeling based on single, composite or multiple templates; 2) side-chain prediction; 3) modeling residue mutation, insertion or deletion; 4) loop prediction; 5) structure refinement; 6) reconstruction of protein missing atoms;7) reconstruction of protein missing residues; 8) prediction of hydrogen atoms; 9) fast calculation of solvent accessible surface area; 10) structure superimposition.
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Version: 1.5 as of Oct, 20, 2002, Stable public release.
* '''Authors''': Z. Xiang and B. Honig
* '''Platforms Tested''': SGI 6.5, Intel Linux and Sun solaris
* '''License''': Freely available to academia.
* '''Keywords''': Protein Structure Modeling
* '''URL''': http://trantor.bioc.columbia.edu/programs/jackal
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Prediction of Side-chain Conformations

===GRASP2===
* '''Description''': GRASP2 is an updated version of the GRASP program used for macromolecular structure and surface visualization, contains a large number of new features and scientific tools: Enhanced GUI; Structure alignment and domain database scanning; A gaussian surface generator and new surface coloring schemes; Sequence visualization and alignment; Completed work can be stored in "project files; Among the many objects that can be stored in a project file are views of the structure; defined subsets, surfaces; Direct printing to printers at full printer resolution.
* '''Data Input''': PDB files, potential maps from DelPhi, sequence alignments.
* '''Data Output''': molecular graphics, structural alignments.
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Stable public release
* '''Authors''': Donald Petrey and Barry Honig.
* '''Platforms Tested''': Windows, Linux
* '''License''': Freely available to academia.
* '''Keywords''': molecular visualization
* '''URL''': http://trantor.bioc.columbia.edu/grasp2
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': SoftwareFunction --> Protein Modeling and Classification --> Molecular Visualization Package

===PrISM===
* '''Description''': PrISM is an integrated computational system where computational tools are implemented for protein sequence and structure analysis and modeling.
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''': Fortran
* '''Version, Date, Stage''': Stable public release
* '''Authors''': Wang, L, Yang, A. S. & Honig, B.
* '''Platforms Tested''':SGI-irix, Intel-linux
* '''License''': Freely available to academia.
* '''Keywords''': protein analysis/modeling
* '''URL''': http://trantor.bioc.columbia.edu/programs/PrISM/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Homology Modeling

===Protein-DNA interface alignment===
* '''Description''': The protein-DNA alignment software allows one to align the interfacial amino acids from two protein-DNA complexes based on the geometric relationship of each amino acid to its local DNA.
* '''Data Input''': two PDB files that both contain protein-DNA complexes
* '''Data Output''': The programs will output the aligned residues and their corresponding residue-residue similarity scores, s(i,j).
* '''Implementation Language''': C++ and Perl
* '''Version, Date, Stage''':Stable public release.
* '''Authors''': Siggers, T.W., Silkov, A & Honig, B.
* '''Platforms Tested''': Linux
* '''License''': Freely available to academia
* '''Keywords''': protein-DNA interface
* '''URL''': http://trantor.bioc.columbia.edu/programs/intfc_aln
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Prediction of Side-chain Conformations

===SURFace===
* '''Description''': SURFace algorithms are programs that calculate solvent accessible surface area and curvature corrected solvent accessible surface area
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''':
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Nicholls, A., Sharp, K., Sridharan, S. and Honig, B.
* '''Platforms Tested''': SGI
* '''License''': Freely available to academia.
* '''Keywords''': solvent accessible surface area
* '''URL''': http://trantor.bioc.columbia.edu/surf/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Caculation of Solvent Accessible Area

===Target Explorer===
* '''Description''': Automated process of prediction of complex regulatory elements for specified set of transcription factors in Drosophila melanogaster genome. Target Explorer is a complex tool with user-friendly self-explanatory Web-interface that allows to user: 1. create customized library of TF binding site matrices based on user defined sets of training sequences; 2. search for new clusters of binding sites for specified set of TFs; 3.extract annotation for potential target genes.
* '''Data Input''': genomic sequences
* '''Data Output''': clusters of known binding sites
* '''Implementation Language''': perl, cgi
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Sosinsky A, Bonin CP, Mann RS, Honig B.
* '''Platforms Tested''': platform independent (web based tool)
* '''License''': Freely available to academia.
* '''Keywords''': prediction of binding sites for transcription factors
* '''URL''': http://trantor.bioc.columbia.edu/Target_Explorer/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis--> Sequence Annotation

===MEDUSA and Gorgon===
* '''Description''': MEDUSA is an algorithm for learning predictive models of transcriptional gene regulation from gene expression and promoter sequence data. By using a statistical learning approach based on boosing, MEDUSA learns cis regulatory motifs, condition-specific regulators, and regulatory programs that predict the differential expression of target genes. The regulatory program is specified as an alternating decision tree (ADT). The Java implementation of MEDUSA will allow a number of visualizations of the regulatory program and other inferred regulatory information, implemented in the accompanying Gorgon tool, including hits of significant and condition-specific motifs along the promoter sequences of target genes and regulatory network figures viewable in Cytoscape.
* '''Data Input''': Discretized (up/down/baseline) gene expression data in plain text format, promoter sequences in FASTA format, list of candidate transcriptional regulators and signal transducers in plain text format.
* '''Data Output''': Regulatory program represented as a Java serialized object file readable by Gorgon and as a human readable XML file. Gorgon currently generates views of learned PSSMs, positional hits along promoter sequences, and views of the ADT as HTML files, and generates network figures as Cytoscape format files.
* '''Implementation Language''': Java (prototyped in MATLAB)
* '''Version, Date, Stage''': Version 2.0, July 2006, pre-release beta version; Version 1.0 (MATLAB), April 2005, stable public release
* '''Authors''': David Quigley, Manuel Middendorf, Steve Lianoglou, Anshul Kundaje, Yoav Freund, Chris Wiggins, Christina Leslie
* '''Platforms Tested''': Windows, Linux, Mac OS X
* '''License''': Open source
* '''Keywords''':
* '''URL''': http://www.cs.columbia.edu/compbio/medusa (MATLAB),http://compbio.sytes.net:8090/medusa (Java beta version)
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===String kernel package===
* '''Description''':The string kernel package contains implementations for the mismatch and profile string kernels for use with support vector machine (SVM) classifiers for protein sequence classification. Both kernels compute similarity between protein sequences based on common occurrences of k-length subsequences ("k-mers") counted with substitutions. Kernel functions for protein sequence data enable the training of SVMs for a range of prediction problems, in particular protein structural class prediction and remote homology detection. A version of the Spider MATLAB machine learning package is also bundled with the code, which allows users to train SVMs and evaluate performance on test sets with the packaged software.
* '''Data Input''': The mismatch kernel requires sequence data in FASTA format. The profile string kernel uses probabilistic profiles, such as those produced by PSI-BLAST, in place of the original sequences. The Spider SVM implementation requires both the kernel matrix and a label file of binary or multi-class labels for the training data; this data must be loaded into MATLAB variables before using Spider routing.
* '''Data Output''':The kernel code produces a kernel matrix for the input data in tab-delimited text format. The Spider package trains SVMs and stores the learns classifier and results from applying the classifier on test data as MATLAB objects.
* '''Implementation Language''': String kernel code is implemented in C. Spider is a set of object-oriented MATLAB routines.
* '''Version, Date, Stage''': Version 1.2, September 2004, stable public release
* '''Authors''': Eleazar Eskin, Rui Kuang, Eugene Ie, Ke Wang, Jason Weston, Bill Noble, Christina Leslie
* '''Platforms Tested''': Windows, Linux
* '''License''': Open source
* '''Keywords''':
* '''URL''': http://www.cs.columbia.edu/compbio/string-kernels
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===MatrixREDUCE===
* '''Description''': Regulation of gene expression by a transcription factor requires physical interaction between the factor and the DNA, which can be described by astatistical mechanical model. Based on this model, the MatrixREDUCE algorithm uses genome-wide occupancy data for a transcription factor (e.g.ChIP-chip or mRNA expression data) and associated nucleotide sequences to discover the sequence-specific binding affinity of the transcription factor. The sequence specificity of the transcription factor's DNA-binding domain is modeled using a position-specific affinity matrix (PSAM), representing the change in the binding affinity (Kd) whenever a specific position within a reference binding sequence is mutated. The PSAM can be transformed into affinity logo for visualization using the utility program AffinityLogo, and a MatrixREDUCE run can be summarized in an easy-to-navigate webpage using HTMLSummary.
* '''Data Input''': sequence file in FASTA format; and expression data file in tab-delimited text format.
* '''Data Output''': PSAMs in numeric and graphical format, parameters of the fitted model, and an HTML summary page.
* '''Implementation Language''': ANSI C, making use of Numerical Recipes routines.
* '''Version, Date, Stage''': Version 1.0, July 10, 2006, extensively tested in lab.
* '''Authors''': Barrett Foat, Xiang-Jun Lu, Harmen J. Bussemaker
* '''Platforms Tested''': Linux, Cygwin (Windows), Mac OS X
* '''License''':
* '''Keywords''': position-specific affinity matrix, binding affinity, cis-regulatory element, expression data, ChIP-chip, transcription factor
* '''URL''': http://www.bussemakerlab.org/software/MatrixREDUCE
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===T-profiler===
* '''Description''': T-profiler is a web-based tool that uses the t-test to score changes in the average activity of pre-defined groups of genes. The gene groups are defined based on Gene Ontology categorization, ChIP-chip experiments, upstream matches to a consensus transcription factor binding motif, and location on the same chromosome, respectively. If desired, an iterative procedure can be used to select a single, optimal representative from sets of overlapping gene groups. A jack-knife procedure is used to make calculations more robust against outliers. T-profiler makes it possible to interpret microarray data in a way that is both intuitive and statistically rigorous, without the need to combine experiments or choose parameters.
* '''Data Input''': Currently, gene expression data from Saccharomyces cerevisiae and Candida albicans are supported.
* '''Data Output''':
* '''Implementation Language''': T-profiler is written in PHP, data is managed by a MYSQL database server
* '''Version, Date, Stage''':
* '''Authors''': André Boorsma, Barrett C. Foat, Daniel Vis, Frans Klis, Harmen J. Bussemaker
* '''Platforms Tested''': Web-based application
* '''License''':
* '''Keywords''': gene expression, transcriptome, ChIP-chip, Gene Ontology
* '''URL''': http://www.t-profiler.org
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===TranscriptionDetector===
* '''Description''':A tool for finding probes measuring significantly expressed loci in a genomic array experiment. Given expression data from some tiling array experiment, TranscriptionDetector decides the likelihood that a probe is detecting transcription from the locus in which it resides. Probabilities are assigned by making use of a background signal intensity distribution from a set of negative control probes. This tool is useful for the functional annotation of genomes as it allows for the discovery of novel transcriptional units independently of any genomic annotation.
* '''Data Input''': Expression data (GEO or other platforms) and designation of which probes represent negative controls and which are data probes.
* '''Data Output''': A text file with a list of probes corresponding to significantly expressed loci.
* '''Implementation Language''': ANSI C, making use of GSL.
* '''Version, Date, Stage''':
* '''Authors''': Xiang-Jun Lu, Gabor Halasz, Marinus F. van Batenburg
* '''Platforms Tested''': Linux, Cygwin (Windows), Mac OS X
* '''License''':
* '''Keywords''': tiling arrays, expression, transcriptome
* '''URL''': http://www.bussemakerlab.org/software/TranscriptionDetector/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===PhenoGO===
* '''Description''': PhenoGO adds phenotypic contextual information to existing associations between gene products and Gene Ontology (GO) terms as specified in GO Annotations (GOA). PhenoGO utilizes an existing Natural Language Processing (NLP) system, called BioMedLEE, an existing knowledge-based phenotype organizer system (PhenOS) in conjunction with MeSH indexing and established biomedical ontologies. The system also encodes the context to identifiers that are associated in different biomedical ontologies, including the UMLS, Cell Ontology, Mouse Anatomy, NCBI taxonomy, GO, and Mammalian Phenotype Ontology. In addition, PhenoGO was evaluated for coding of anatomical and cellular information and assigning the coded phenotypes to the correct GOA; results obtained show that PhenoGO has a precision of 91% and recall of 92%, demonstrating that the PhenoGO NLP system can accurately encode a large number of anatomical and cellular ontologies to GO annotations. The PhenoGO Database may be accessed at www.phenogo.org.
* '''Data Input''': Gene Ontology Annotations Files and Medline Abstracts
* '''Data Output''': XML file and www.phenogo.org Web Portal
* '''Implementation Language''': A variety of modules, the web portal is in Java and MySQL, the computational terminology component (phenOS) is written in Perl scripts that queries tables in IBM DB2, the natural language processing component is written in PROLOG.
* '''Version, Date, Stage''': Version 2, Feb 2006
* '''Authors''': Yves Lussier and Carol Friedman are the principal investigators. The programmers are Jianrong Li, Lee Sam, and Tara Borlawsky
* '''Platforms Tested''': n/a
* '''License''': n/a
* '''Keywords''': Phenotypic integration, computational phenotypes
* '''URL''': www.phenogo.org
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===MINDY===
* '''Description''': Given a transcription factor of interest, MINDY uses a large set of gene expression profile data to identify potential post-transcriptional modulators of the transcription factor's activity. MINDY is based on a three-way statistical interaction model that captures the post-transcriptional regulatory event where the ability of a transcription factor to activate/repress its target genes is monotonically controlled by a potential modulator gene.
* '''Data Input''': Gene expression data in the EXP format, and a user-specified transcription factor of interest
* '''Data Output''': Lists of the putative modulators and target genes of the transcription factor, and the modulatory interactions involving them
* '''Implementation Language''': C++ and MATLAB, Java
* '''Version, Date, Stage''': Stable release, April 2007
* '''Authors''': Kai Wang, Ilya Nemenman, Adam Margolin, Riccardo Dalla-Favera, Andrea Califano
* '''Platforms Tested''': Linux, Cygwin
* '''License''': n/a
* '''Keywords''': gene expression, transcriptional interaction, modulator
* '''URL''': n/a
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Regulatory/Signaling network reconstruction

===B Cell Interactome===
* '''Description''': The B cell interactome (BCI) is a network of protein-protein, protein-DNA and modulatory interactions in human B cells. The network contains known interactions (reported in public databases) and predicted interactions by a Bayesian evidence integration framework which integrates a variety of generic and context specific experimental clues about protein-protein and protein-DNA interactions - such as a large collection of B cell expression profiles - with inferences from different reverse engineering algorithms, such as GeneWays and ARACNE. Modulatory interactions are predicted by MINDY, an algorithm for the prediction of modulators of transcriptional interactions.
* '''Data Input''': n/a
* '''Data Output''': text file of binary interations associated with a probability.
* '''Implementation Language''': Perl
* '''Version, Date, Stage''': Version 2, March 2007
* '''Authors''': Lefebvre C, Lim WK, Basso K, Dalla Favera R, and Califano A.
* '''Platforms Tested''': n/a
* '''License''':
* '''Keywords''': Naive Bayes, Mixed-Interaction Network, human B cells.
* '''URL''': http://amdec-bioinfo.cu-genome.org/html/BCellInteractome.html
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Interaction Modeling

===ARACNE===
* '''Description''': ARACNE is an algorithm for inferring gene regulatory networks from a set of microarray experiments. The method uses mutual information to identify genes that are co-expressed and then applies the data processing inequality to filter out interactions that are likely to be indirect.
* '''Data Input''': Text file containing measurements from a set of microarray experiments.
* '''Data Output''': Text file containing predicted interactions.
* '''Implementation Language''': C++, Java
* '''Version, Date, Stage''': Version 1, June, 2006
* '''Authors''': Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A.
* '''Platforms Tested''': Window, Linux
* '''License''': Open source
* '''Keywords''': Reverse engineering, mutual information, genetic networks, microarray
* '''URL''': http://amdec-bioinfo.cu-genome.org/html/ARACNE.htm
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Regulatory/Signaling network reconstruction

===geWorkbench===
* '''Description''': geWorkbench is a Java application that provides users with an integrated suite of genomics tools. It is built on an open-source, extensible architecture that promotes interoperability and simplifies the development of new as well as the incorporation of pre-existing components. The resulting system provides seamless access to a multitude of both local and remote data and computational services through an integrated environment that offers a unified user experience. Over 50 data analysis and visualization components have been developed for the framework, covering a wide range of genomics domains including gene expression, sequence, structure and network data.
* '''Data Input''': Gene epxression data (Affy, GenPix, RMA), Sequence (FASTA), Structure (PDB).
* '''Data Output''': Analysis results (multiple formats).
* '''Implementation Language''': Java
* '''Version, Date, Stage''': 1.0.5, 3/23/07, stable production release
* '''Authors''': A. Califano, A. Floratos. M. Kustagi, K. Smith, J. Watkinson, M. Hall, K. Keshav, X. Zhang, K. Kushal, B. Jagla, E. Daly, M. VanGinhoven, P. Morozov.
* '''Platforms Tested''': Windows XP, Linux, Mac OS 10.x.
* '''License''': Free.
* '''Keywords''': Analysis suite, gene expression analysis, sequence analysis, network reconstruction, structure predcition, visualization.
* '''URL''': http://www.geworkbench.org
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis; Atomic --> SoftwareFunction --> Interaction Modeling; Atomic --> SoftwareFunction --> Protein Modeling and Classification; Atomic --> SoftwareFunction --> Software Engineering and Development Tool --> Integration --> Resource Integration Components; Atomic --> SoftwareFunction --> Software Engineering and Development Tool --> Integration --> Grid Computing Resources; Atomic --> SoftwareFunction --> Visualization

SDIWG:NCBC Software Classification MAGNet Examples

2007-05-13T21:39:01Z

Aris:

===DelPhi===
* '''Description''': DelPhi provides numerical solutions to the Poisson-Boltzmann equation (both linear and nonlinear form) for molecules of arbitrary shape and charge distribution. The current version is fast (the best relaxation parameter is estimated at run time), accurate (calculation of the electrostatic free energy is less dependent on the resolution of the lattice) and can handle extremely high lattice dimensions. It also includes flexible features for assigning different dielectric constants to different regions of space and treating systems containing mixed salt solutions.
* '''Data Input''': DelPhi takes as input a coordinate file format of a molecule or equivalent data for geometrical objects and/or charge distributions
* '''Data Output''': electrostatic potential in and around the system
* '''Implementation Language''': Fortran and C
* '''Version, Date, Stage''': Stable public release
* '''Authors''': E.Alexov, R.Fine, M.K.Gilson, A.Nicholls, W.Rocchia, K.Sharp, and B. Honig.
* '''Platforms Tested''': Unix-SGI IRIX, linux, PC (requires Fortran and C compilers), AIX IBM version and Mac.
* '''License''': Freely available to academia; pay model for commercial users.
* '''Keywords''': Finite Difference Poisson-Boltzman Solver
* '''URL''': http://trantor.bioc.columbia.edu/delphi
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Numerical Calculation of Electrostatic Potential

===GRASP===
* '''Description''': A molecular visualization and analysis program. It is particularly useful for the display and manipulation of the surfaces of molecules and their electrostatic properties.
* '''Data Input''': PDB files, potential maps from DelPhi
* '''Data Output''': molecular graphics.
* '''Implementation Language''': Fortran
* '''Version, Date, Stage''': v1.3.6 .Stable public release.
* '''Authors''': Anthony Nicholls and Barry Honig.
* '''Platforms Tested''': SGI machines: irix 5.x and 6.x (INDYs, INDIGOs including Impact, Octane and O2) systems.
* '''License''': Freely available to academia.
* '''Keywords''': molecular visualization
* '''URL''': http://trantor.bioc.columbia.edu/grasp
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Molecular Visualization Package

===Nest===
* '''Description''': Modeling protein structure based on a sequence-template alignment. The current server works only for modeling with a single template. Part of jackal, which can be downloaded.
* '''Data Input''': pir and PDB files
* '''Data Output''':
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Xiang, Z. and Honig, B.
* '''Platforms Tested''': platform independent (web based tool)
* '''License''': Freely available to academia.
* '''Keywords''': modeling, protein structure, sequence-template alignment.
* '''URL''': http://honiglab.cpmc.columbia.edu/cgi-bin/jackal/nest.cgi
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Homology Modeling

===JACKAL===
* '''Description''': Jackal is a collection of programs designed for the modeling and analysis of protein structures. Its core program is a versatile homology modeling package nest. JACKAL has the following capabilities: 1) comparative modeling based on single, composite or multiple templates; 2) side-chain prediction; 3) modeling residue mutation, insertion or deletion; 4) loop prediction; 5) structure refinement; 6) reconstruction of protein missing atoms;7) reconstruction of protein missing residues; 8) prediction of hydrogen atoms; 9) fast calculation of solvent accessible surface area; 10) structure superimposition.
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Version: 1.5 as of Oct, 20, 2002, Stable public release.
* '''Authors''': Z. Xiang and B. Honig
* '''Platforms Tested''': SGI 6.5, Intel Linux and Sun solaris
* '''License''': Freely available to academia.
* '''Keywords''': Protein Structure Modeling
* '''URL''': http://trantor.bioc.columbia.edu/programs/jackal
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Prediction of Side-chain Conformations

===GRASP2===
* '''Description''': GRASP2 is an updated version of the GRASP program used for macromolecular structure and surface visualization, contains a large number of new features and scientific tools: Enhanced GUI; Structure alignment and domain database scanning; A gaussian surface generator and new surface coloring schemes; Sequence visualization and alignment; Completed work can be stored in "project files; Among the many objects that can be stored in a project file are views of the structure; defined subsets, surfaces; Direct printing to printers at full printer resolution.
* '''Data Input''': PDB files, potential maps from DelPhi, sequence alignments.
* '''Data Output''': molecular graphics, structural alignments.
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Stable public release
* '''Authors''': Donald Petrey and Barry Honig.
* '''Platforms Tested''': Windows, Linux
* '''License''': Freely available to academia.
* '''Keywords''': molecular visualization
* '''URL''': http://trantor.bioc.columbia.edu/grasp2
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': SoftwareFunction --> Protein Modeling and Classification --> Molecular Visualization Package

===PrISM===
* '''Description''': PrISM is an integrated computational system where computational tools are implemented for protein sequence and structure analysis and modeling.
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''': Fortran
* '''Version, Date, Stage''': Stable public release
* '''Authors''': Wang, L, Yang, A. S. & Honig, B.
* '''Platforms Tested''':SGI-irix, Intel-linux
* '''License''': Freely available to academia.
* '''Keywords''': protein analysis/modeling
* '''URL''': http://trantor.bioc.columbia.edu/programs/PrISM/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Homology Modeling

===Protein-DNA interface alignment===
* '''Description''': The protein-DNA alignment software allows one to align the interfacial amino acids from two protein-DNA complexes based on the geometric relationship of each amino acid to its local DNA.
* '''Data Input''': two PDB files that both contain protein-DNA complexes
* '''Data Output''': The programs will output the aligned residues and their corresponding residue-residue similarity scores, s(i,j).
* '''Implementation Language''': C++ and Perl
* '''Version, Date, Stage''':Stable public release.
* '''Authors''': Siggers, T.W., Silkov, A & Honig, B.
* '''Platforms Tested''': Linux
* '''License''': Freely available to academia
* '''Keywords''': protein-DNA interface
* '''URL''': http://trantor.bioc.columbia.edu/programs/intfc_aln
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Prediction of Side-chain Conformations

===SURFace===
* '''Description''': SURFace algorithms are programs that calculate solvent accessible surface area and curvature corrected solvent accessible surface area
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''':
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Nicholls, A., Sharp, K., Sridharan, S. and Honig, B.
* '''Platforms Tested''': SGI
* '''License''': Freely available to academia.
* '''Keywords''': solvent accessible surface area
* '''URL''': http://trantor.bioc.columbia.edu/surf/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Caculation of Solvent Accessible Area

===Target Explorer===
* '''Description''': Automated process of prediction of complex regulatory elements for specified set of transcription factors in Drosophila melanogaster genome. Target Explorer is a complex tool with user-friendly self-explanatory Web-interface that allows to user: 1. create customized library of TF binding site matrices based on user defined sets of training sequences; 2. search for new clusters of binding sites for specified set of TFs; 3.extract annotation for potential target genes.
* '''Data Input''': genomic sequences
* '''Data Output''': clusters of known binding sites
* '''Implementation Language''': perl, cgi
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Sosinsky A, Bonin CP, Mann RS, Honig B.
* '''Platforms Tested''': platform independent (web based tool)
* '''License''': Freely available to academia.
* '''Keywords''': prediction of binding sites for transcription factors
* '''URL''': http://trantor.bioc.columbia.edu/Target_Explorer/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis--> Sequence Annotation

===MEDUSA and Gorgon===
* '''Description''': MEDUSA is an algorithm for learning predictive models of transcriptional gene regulation from gene expression and promoter sequence data. By using a statistical learning approach based on boosing, MEDUSA learns cis regulatory motifs, condition-specific regulators, and regulatory programs that predict the differential expression of target genes. The regulatory program is specified as an alternating decision tree (ADT). The Java implementation of MEDUSA will allow a number of visualizations of the regulatory program and other inferred regulatory information, implemented in the accompanying Gorgon tool, including hits of significant and condition-specific motifs along the promoter sequences of target genes and regulatory network figures viewable in Cytoscape.
* '''Data Input''': Discretized (up/down/baseline) gene expression data in plain text format, promoter sequences in FASTA format, list of candidate transcriptional regulators and signal transducers in plain text format.
* '''Data Output''': Regulatory program represented as a Java serialized object file readable by Gorgon and as a human readable XML file. Gorgon currently generates views of learned PSSMs, positional hits along promoter sequences, and views of the ADT as HTML files, and generates network figures as Cytoscape format files.
* '''Implementation Language''': Java (prototyped in MATLAB)
* '''Version, Date, Stage''': Version 2.0, July 2006, pre-release beta version; Version 1.0 (MATLAB), April 2005, stable public release
* '''Authors''': David Quigley, Manuel Middendorf, Steve Lianoglou, Anshul Kundaje, Yoav Freund, Chris Wiggins, Christina Leslie
* '''Platforms Tested''': Windows, Linux, Mac OS X
* '''License''': Open source
* '''Keywords''':
* '''URL''': http://www.cs.columbia.edu/compbio/medusa (MATLAB),http://compbio.sytes.net:8090/medusa (Java beta version)
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===String kernel package===
* '''Description''':The string kernel package contains implementations for the mismatch and profile string kernels for use with support vector machine (SVM) classifiers for protein sequence classification. Both kernels compute similarity between protein sequences based on common occurrences of k-length subsequences ("k-mers") counted with substitutions. Kernel functions for protein sequence data enable the training of SVMs for a range of prediction problems, in particular protein structural class prediction and remote homology detection. A version of the Spider MATLAB machine learning package is also bundled with the code, which allows users to train SVMs and evaluate performance on test sets with the packaged software.
* '''Data Input''': The mismatch kernel requires sequence data in FASTA format. The profile string kernel uses probabilistic profiles, such as those produced by PSI-BLAST, in place of the original sequences. The Spider SVM implementation requires both the kernel matrix and a label file of binary or multi-class labels for the training data; this data must be loaded into MATLAB variables before using Spider routing.
* '''Data Output''':The kernel code produces a kernel matrix for the input data in tab-delimited text format. The Spider package trains SVMs and stores the learns classifier and results from applying the classifier on test data as MATLAB objects.
* '''Implementation Language''': String kernel code is implemented in C. Spider is a set of object-oriented MATLAB routines.
* '''Version, Date, Stage''': Version 1.2, September 2004, stable public release
* '''Authors''': Eleazar Eskin, Rui Kuang, Eugene Ie, Ke Wang, Jason Weston, Bill Noble, Christina Leslie
* '''Platforms Tested''': Windows, Linux
* '''License''': Open source
* '''Keywords''':
* '''URL''': http://www.cs.columbia.edu/compbio/string-kernels
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===MatrixREDUCE===
* '''Description''': Regulation of gene expression by a transcription factor requires physical interaction between the factor and the DNA, which can be described by astatistical mechanical model. Based on this model, the MatrixREDUCE algorithm uses genome-wide occupancy data for a transcription factor (e.g.ChIP-chip or mRNA expression data) and associated nucleotide sequences to discover the sequence-specific binding affinity of the transcription factor. The sequence specificity of the transcription factor's DNA-binding domain is modeled using a position-specific affinity matrix (PSAM), representing the change in the binding affinity (Kd) whenever a specific position within a reference binding sequence is mutated. The PSAM can be transformed into affinity logo for visualization using the utility program AffinityLogo, and a MatrixREDUCE run can be summarized in an easy-to-navigate webpage using HTMLSummary.
* '''Data Input''': sequence file in FASTA format; and expression data file in tab-delimited text format.
* '''Data Output''': PSAMs in numeric and graphical format, parameters of the fitted model, and an HTML summary page.
* '''Implementation Language''': ANSI C, making use of Numerical Recipes routines.
* '''Version, Date, Stage''': Version 1.0, July 10, 2006, extensively tested in lab.
* '''Authors''': Barrett Foat, Xiang-Jun Lu, Harmen J. Bussemaker
* '''Platforms Tested''': Linux, Cygwin (Windows), Mac OS X
* '''License''':
* '''Keywords''': position-specific affinity matrix, binding affinity, cis-regulatory element, expression data, ChIP-chip, transcription factor
* '''URL''': http://www.bussemakerlab.org/software/MatrixREDUCE
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===T-profiler===
* '''Description''': T-profiler is a web-based tool that uses the t-test to score changes in the average activity of pre-defined groups of genes. The gene groups are defined based on Gene Ontology categorization, ChIP-chip experiments, upstream matches to a consensus transcription factor binding motif, and location on the same chromosome, respectively. If desired, an iterative procedure can be used to select a single, optimal representative from sets of overlapping gene groups. A jack-knife procedure is used to make calculations more robust against outliers. T-profiler makes it possible to interpret microarray data in a way that is both intuitive and statistically rigorous, without the need to combine experiments or choose parameters.
* '''Data Input''': Currently, gene expression data from Saccharomyces cerevisiae and Candida albicans are supported.
* '''Data Output''':
* '''Implementation Language''': T-profiler is written in PHP, data is managed by a MYSQL database server
* '''Version, Date, Stage''':
* '''Authors''': André Boorsma, Barrett C. Foat, Daniel Vis, Frans Klis, Harmen J. Bussemaker
* '''Platforms Tested''': Web-based application
* '''License''':
* '''Keywords''': gene expression, transcriptome, ChIP-chip, Gene Ontology
* '''URL''': http://www.t-profiler.org
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===TranscriptionDetector===
* '''Description''':A tool for finding probes measuring significantly expressed loci in a genomic array experiment. Given expression data from some tiling array experiment, TranscriptionDetector decides the likelihood that a probe is detecting transcription from the locus in which it resides. Probabilities are assigned by making use of a background signal intensity distribution from a set of negative control probes. This tool is useful for the functional annotation of genomes as it allows for the discovery of novel transcriptional units independently of any genomic annotation.
* '''Data Input''': Expression data (GEO or other platforms) and designation of which probes represent negative controls and which are data probes.
* '''Data Output''': A text file with a list of probes corresponding to significantly expressed loci.
* '''Implementation Language''': ANSI C, making use of GSL.
* '''Version, Date, Stage''':
* '''Authors''': Xiang-Jun Lu, Gabor Halasz, Marinus F. van Batenburg
* '''Platforms Tested''': Linux, Cygwin (Windows), Mac OS X
* '''License''':
* '''Keywords''': tiling arrays, expression, transcriptome
* '''URL''': http://www.bussemakerlab.org/software/TranscriptionDetector/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===PhenoGO===
* '''Description''': PhenoGO adds phenotypic contextual information to existing associations between gene products and Gene Ontology (GO) terms as specified in GO Annotations (GOA). PhenoGO utilizes an existing Natural Language Processing (NLP) system, called BioMedLEE, an existing knowledge-based phenotype organizer system (PhenOS) in conjunction with MeSH indexing and established biomedical ontologies. The system also encodes the context to identifiers that are associated in different biomedical ontologies, including the UMLS, Cell Ontology, Mouse Anatomy, NCBI taxonomy, GO, and Mammalian Phenotype Ontology. In addition, PhenoGO was evaluated for coding of anatomical and cellular information and assigning the coded phenotypes to the correct GOA; results obtained show that PhenoGO has a precision of 91% and recall of 92%, demonstrating that the PhenoGO NLP system can accurately encode a large number of anatomical and cellular ontologies to GO annotations. The PhenoGO Database may be accessed at www.phenogo.org.
* '''Data Input''': Gene Ontology Annotations Files and Medline Abstracts
* '''Data Output''': XML file and www.phenogo.org Web Portal
* '''Implementation Language''': A variety of modules, the web portal is in Java and MySQL, the computational terminology component (phenOS) is written in Perl scripts that queries tables in IBM DB2, the natural language processing component is written in PROLOG.
* '''Version, Date, Stage''': Version 2, Feb 2006
* '''Authors''': Yves Lussier and Carol Friedman are the principal investigators. The programmers are Jianrong Li, Lee Sam, and Tara Borlawsky
* '''Platforms Tested''': n/a
* '''License''': n/a
* '''Keywords''': Phenotypic integration, computational phenotypes
* '''URL''': www.phenogo.org
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===MINDY===
* '''Description''': Given a transcription factor of interest, MINDY uses a large set of gene expression profile data to identify potential post-transcriptional modulators of the transcription factor's activity. MINDY is based on a three-way statistical interaction model that captures the post-transcriptional regulatory event where the ability of a transcription factor to activate/repress its target genes is monotonically controlled by a potential modulator gene.
* '''Data Input''': Gene expression data in the EXP format, and a user-specified transcription factor of interest
* '''Data Output''': Lists of the putative modulators and target genes of the transcription factor, and the modulatory interactions involving them
* '''Implementation Language''': C++ and MATLAB, Java
* '''Version, Date, Stage''': Stable release, April 2007
* '''Authors''': Kai Wang, Ilya Nemenman, Adam Margolin, Riccardo Dalla-Favera, Andrea Califano
* '''Platforms Tested''': Linux, Cygwin
* '''License''': n/a
* '''Keywords''': gene expression, transcriptional interaction, modulator
* '''URL''': n/a
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Regulatory/Signaling network reconstruction

===B Cell Interactome===
* '''Description''': The B cell interactome (BCI) is a network of protein-protein, protein-DNA and modulatory interactions in human B cells. The network contains known interactions (reported in public databases) and predicted interactions by a Bayesian evidence integration framework which integrates a variety of generic and context specific experimental clues about protein-protein and protein-DNA interactions - such as a large collection of B cell expression profiles - with inferences from different reverse engineering algorithms, such as GeneWays and ARACNE. Modulatory interactions are predicted by MINDY, an algorithm for the prediction of modulators of transcriptional interactions.
* '''Data Input''': n/a
* '''Data Output''': text file of binary interations associated with a probability.
* '''Implementation Language''': Perl
* '''Version, Date, Stage''': Version 2, March 2007
* '''Authors''': Lefebvre C, Lim WK, Basso K, Dalla Favera R, and Califano A.
* '''Platforms Tested''': n/a
* '''License''':
* '''Keywords''': Naive Bayes, Mixed-Interaction Network, human B cells.
* '''URL''': http://amdec-bioinfo.cu-genome.org/html/BCellInteractome.html
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Interaction Modeling

===ARACNE===
* '''Description''': ARACNE is an algorithm for inferring gene regulatory networks from a set of microarray experiments. The method uses mutual information to identify genes that are co-expressed and then applies the data processing inequality to filter out interactions that are likely to be indirect.
* '''Data Input''': Text file containing measurements from a set of microarray experiments.
* '''Data Output''': Text file containing predicted interactions.
* '''Implementation Language''': C++, Java
* '''Version, Date, Stage''': Version 1, June, 2006
* '''Authors''': Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A.
* '''Platforms Tested''': Window, Linux
* '''License''': Open source
* '''Keywords''': Reverse engineering, mutual information, genetic networks, microarray
* '''URL''': http://amdec-bioinfo.cu-genome.org/html/ARACNE.htm
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Regulatory/Signaling network reconstruction

===geWorkbench===
* '''Description''': geWorkbench is a Java application that provides users with an integrated suite of genomics tools. It is built on an open-source, extensible architecture that promotes interoperability and simplifies the development of new as well as the incorporation of pre-existing components. The resulting system provides seamless access to a multitude of both local and remote data and computational services through an integrated environment that offers a unified user experience. Over 50 data analysis and visualization components have been developed for the framework, covering a wide range of genomics domains including gene expression, sequence, structure and network data.
* '''Data Input''': Gene epxression data (Affy, GenPix, RMA), Sequence (FASTA), Structure (PDB).
* '''Data Output''': Analysis results (multiple formats).
* '''Implementation Language''': Java
* '''Version, Date, Stage''': 1.0.5, 3/23/07, stable production release
* '''Authors''': A. Califano, A. Floratos. M. Kustagi, K. Smith, J. Watkinson, M. Hall, K. Keshav, X. Zhang, K. Kushal, B. Jagla, E. Daly, M. VanGinhoven, P. Morozov.
* '''Platforms Tested''': Windows XP, Linux, Mac OS 10.x.
* '''License''': Free.
* '''Keywords''': Analysis suite, gene expression analysis, sequence analysis, network reconstruction, structure predcition, visualization.
* '''URL''': http://www.geworkbench.org
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis, Atomic --> SoftwareFunction --> Interaction Modeling, Atomic --> SoftwareFunction --> Protein Modeling and Classification, Atomic --> SoftwareFunction --> Software Engineering and Development Tool --> Integration --> Resource Integration Components, Atomic --> SoftwareFunction --> Software Engineering and Development Tool --> Integration --> Grid Computing Resources, Atomic --> SoftwareFunction --> Visualization

SDIWG:NCBC Software Classification MAGNet Examples

2007-05-13T21:18:30Z

Aris:

===DelPhi===
* '''Description''': DelPhi provides numerical solutions to the Poisson-Boltzmann equation (both linear and nonlinear form) for molecules of arbitrary shape and charge distribution. The current version is fast (the best relaxation parameter is estimated at run time), accurate (calculation of the electrostatic free energy is less dependent on the resolution of the lattice) and can handle extremely high lattice dimensions. It also includes flexible features for assigning different dielectric constants to different regions of space and treating systems containing mixed salt solutions.
* '''Data Input''': DelPhi takes as input a coordinate file format of a molecule or equivalent data for geometrical objects and/or charge distributions
* '''Data Output''': electrostatic potential in and around the system
* '''Implementation Language''': Fortran and C
* '''Version, Date, Stage''': Stable public release
* '''Authors''': E.Alexov, R.Fine, M.K.Gilson, A.Nicholls, W.Rocchia, K.Sharp, and B. Honig.
* '''Platforms Tested''': Unix-SGI IRIX, linux, PC (requires Fortran and C compilers), AIX IBM version and Mac.
* '''License''': Freely available to academia; pay model for commercial users.
* '''Keywords''': Finite Difference Poisson-Boltzman Solver
* '''URL''': http://trantor.bioc.columbia.edu/delphi
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Numerical Calculation of Electrostatic Potential

===GRASP===
* '''Description''': A molecular visualization and analysis program. It is particularly useful for the display and manipulation of the surfaces of molecules and their electrostatic properties.
* '''Data Input''': PDB files, potential maps from DelPhi
* '''Data Output''': molecular graphics.
* '''Implementation Language''': Fortran
* '''Version, Date, Stage''': v1.3.6 .Stable public release.
* '''Authors''': Anthony Nicholls and Barry Honig.
* '''Platforms Tested''': SGI machines: irix 5.x and 6.x (INDYs, INDIGOs including Impact, Octane and O2) systems.
* '''License''': Freely available to academia.
* '''Keywords''': molecular visualization
* '''URL''': http://trantor.bioc.columbia.edu/grasp
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Molecular Visualization Package

===Nest===
* '''Description''': Modeling protein structure based on a sequence-template alignment. The current server works only for modeling with a single template. Part of jackal, which can be downloaded.
* '''Data Input''': pir and PDB files
* '''Data Output''':
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Xiang, Z. and Honig, B.
* '''Platforms Tested''': platform independent (web based tool)
* '''License''': Freely available to academia.
* '''Keywords''': modeling, protein structure, sequence-template alignment.
* '''URL''': http://honiglab.cpmc.columbia.edu/cgi-bin/jackal/nest.cgi
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Homology Modeling

===JACKAL===
* '''Description''': Jackal is a collection of programs designed for the modeling and analysis of protein structures. Its core program is a versatile homology modeling package nest. JACKAL has the following capabilities: 1) comparative modeling based on single, composite or multiple templates; 2) side-chain prediction; 3) modeling residue mutation, insertion or deletion; 4) loop prediction; 5) structure refinement; 6) reconstruction of protein missing atoms;7) reconstruction of protein missing residues; 8) prediction of hydrogen atoms; 9) fast calculation of solvent accessible surface area; 10) structure superimposition.
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Version: 1.5 as of Oct, 20, 2002, Stable public release.
* '''Authors''': Z. Xiang and B. Honig
* '''Platforms Tested''': SGI 6.5, Intel Linux and Sun solaris
* '''License''': Freely available to academia.
* '''Keywords''': Protein Structure Modeling
* '''URL''': http://trantor.bioc.columbia.edu/programs/jackal
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Prediction of Side-chain Conformations

===GRASP2===
* '''Description''': GRASP2 is an updated version of the GRASP program used for macromolecular structure and surface visualization, contains a large number of new features and scientific tools: Enhanced GUI; Structure alignment and domain database scanning; A gaussian surface generator and new surface coloring schemes; Sequence visualization and alignment; Completed work can be stored in "project files; Among the many objects that can be stored in a project file are views of the structure; defined subsets, surfaces; Direct printing to printers at full printer resolution.
* '''Data Input''': PDB files, potential maps from DelPhi, sequence alignments.
* '''Data Output''': molecular graphics, structural alignments.
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Stable public release
* '''Authors''': Donald Petrey and Barry Honig.
* '''Platforms Tested''': Windows, Linux
* '''License''': Freely available to academia.
* '''Keywords''': molecular visualization
* '''URL''': http://trantor.bioc.columbia.edu/grasp2
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': SoftwareFunction --> Protein Modeling and Classification --> Molecular Visualization Package

===PrISM===
* '''Description''': PrISM is an integrated computational system where computational tools are implemented for protein sequence and structure analysis and modeling.
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''': Fortran
* '''Version, Date, Stage''': Stable public release
* '''Authors''': Wang, L, Yang, A. S. & Honig, B.
* '''Platforms Tested''':SGI-irix, Intel-linux
* '''License''': Freely available to academia.
* '''Keywords''': protein analysis/modeling
* '''URL''': http://trantor.bioc.columbia.edu/programs/PrISM/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Homology Modeling

===Protein-DNA interface alignment===
* '''Description''': The protein-DNA alignment software allows one to align the interfacial amino acids from two protein-DNA complexes based on the geometric relationship of each amino acid to its local DNA.
* '''Data Input''': two PDB files that both contain protein-DNA complexes
* '''Data Output''': The programs will output the aligned residues and their corresponding residue-residue similarity scores, s(i,j).
* '''Implementation Language''': C++ and Perl
* '''Version, Date, Stage''':Stable public release.
* '''Authors''': Siggers, T.W., Silkov, A & Honig, B.
* '''Platforms Tested''': Linux
* '''License''': Freely available to academia
* '''Keywords''': protein-DNA interface
* '''URL''': http://trantor.bioc.columbia.edu/programs/intfc_aln
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Prediction of Side-chain Conformations

===SURFace===
* '''Description''': SURFace algorithms are programs that calculate solvent accessible surface area and curvature corrected solvent accessible surface area
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''':
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Nicholls, A., Sharp, K., Sridharan, S. and Honig, B.
* '''Platforms Tested''': SGI
* '''License''': Freely available to academia.
* '''Keywords''': solvent accessible surface area
* '''URL''': http://trantor.bioc.columbia.edu/surf/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Caculation of Solvent Accessible Area

===Target Explorer===
* '''Description''': Automated process of prediction of complex regulatory elements for specified set of transcription factors in Drosophila melanogaster genome. Target Explorer is a complex tool with user-friendly self-explanatory Web-interface that allows to user: 1. create customized library of TF binding site matrices based on user defined sets of training sequences; 2. search for new clusters of binding sites for specified set of TFs; 3.extract annotation for potential target genes.
* '''Data Input''': genomic sequences
* '''Data Output''': clusters of known binding sites
* '''Implementation Language''': perl, cgi
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Sosinsky A, Bonin CP, Mann RS, Honig B.
* '''Platforms Tested''': platform independent (web based tool)
* '''License''': Freely available to academia.
* '''Keywords''': prediction of binding sites for transcription factors
* '''URL''': http://trantor.bioc.columbia.edu/Target_Explorer/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis--> Sequence Annotation

===MEDUSA and Gorgon===
* '''Description''': MEDUSA is an algorithm for learning predictive models of transcriptional gene regulation from gene expression and promoter sequence data. By using a statistical learning approach based on boosing, MEDUSA learns cis regulatory motifs, condition-specific regulators, and regulatory programs that predict the differential expression of target genes. The regulatory program is specified as an alternating decision tree (ADT). The Java implementation of MEDUSA will allow a number of visualizations of the regulatory program and other inferred regulatory information, implemented in the accompanying Gorgon tool, including hits of significant and condition-specific motifs along the promoter sequences of target genes and regulatory network figures viewable in Cytoscape.
* '''Data Input''': Discretized (up/down/baseline) gene expression data in plain text format, promoter sequences in FASTA format, list of candidate transcriptional regulators and signal transducers in plain text format.
* '''Data Output''': Regulatory program represented as a Java serialized object file readable by Gorgon and as a human readable XML file. Gorgon currently generates views of learned PSSMs, positional hits along promoter sequences, and views of the ADT as HTML files, and generates network figures as Cytoscape format files.
* '''Implementation Language''': Java (prototyped in MATLAB)
* '''Version, Date, Stage''': Version 2.0, July 2006, pre-release beta version; Version 1.0 (MATLAB), April 2005, stable public release
* '''Authors''': David Quigley, Manuel Middendorf, Steve Lianoglou, Anshul Kundaje, Yoav Freund, Chris Wiggins, Christina Leslie
* '''Platforms Tested''': Windows, Linux, Mac OS X
* '''License''': Open source
* '''Keywords''':
* '''URL''': http://www.cs.columbia.edu/compbio/medusa (MATLAB),http://compbio.sytes.net:8090/medusa (Java beta version)
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===String kernel package===
* '''Description''':The string kernel package contains implementations for the mismatch and profile string kernels for use with support vector machine (SVM) classifiers for protein sequence classification. Both kernels compute similarity between protein sequences based on common occurrences of k-length subsequences ("k-mers") counted with substitutions. Kernel functions for protein sequence data enable the training of SVMs for a range of prediction problems, in particular protein structural class prediction and remote homology detection. A version of the Spider MATLAB machine learning package is also bundled with the code, which allows users to train SVMs and evaluate performance on test sets with the packaged software.
* '''Data Input''': The mismatch kernel requires sequence data in FASTA format. The profile string kernel uses probabilistic profiles, such as those produced by PSI-BLAST, in place of the original sequences. The Spider SVM implementation requires both the kernel matrix and a label file of binary or multi-class labels for the training data; this data must be loaded into MATLAB variables before using Spider routing.
* '''Data Output''':The kernel code produces a kernel matrix for the input data in tab-delimited text format. The Spider package trains SVMs and stores the learns classifier and results from applying the classifier on test data as MATLAB objects.
* '''Implementation Language''': String kernel code is implemented in C. Spider is a set of object-oriented MATLAB routines.
* '''Version, Date, Stage''': Version 1.2, September 2004, stable public release
* '''Authors''': Eleazar Eskin, Rui Kuang, Eugene Ie, Ke Wang, Jason Weston, Bill Noble, Christina Leslie
* '''Platforms Tested''': Windows, Linux
* '''License''': Open source
* '''Keywords''':
* '''URL''': http://www.cs.columbia.edu/compbio/string-kernels
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===MatrixREDUCE===
* '''Description''': Regulation of gene expression by a transcription factor requires physical interaction between the factor and the DNA, which can be described by astatistical mechanical model. Based on this model, the MatrixREDUCE algorithm uses genome-wide occupancy data for a transcription factor (e.g.ChIP-chip or mRNA expression data) and associated nucleotide sequences to discover the sequence-specific binding affinity of the transcription factor. The sequence specificity of the transcription factor's DNA-binding domain is modeled using a position-specific affinity matrix (PSAM), representing the change in the binding affinity (Kd) whenever a specific position within a reference binding sequence is mutated. The PSAM can be transformed into affinity logo for visualization using the utility program AffinityLogo, and a MatrixREDUCE run can be summarized in an easy-to-navigate webpage using HTMLSummary.
* '''Data Input''': sequence file in FASTA format; and expression data file in tab-delimited text format.
* '''Data Output''': PSAMs in numeric and graphical format, parameters of the fitted model, and an HTML summary page.
* '''Implementation Language''': ANSI C, making use of Numerical Recipes routines.
* '''Version, Date, Stage''': Version 1.0, July 10, 2006, extensively tested in lab.
* '''Authors''': Barrett Foat, Xiang-Jun Lu, Harmen J. Bussemaker
* '''Platforms Tested''': Linux, Cygwin (Windows), Mac OS X
* '''License''':
* '''Keywords''': position-specific affinity matrix, binding affinity, cis-regulatory element, expression data, ChIP-chip, transcription factor
* '''URL''': http://www.bussemakerlab.org/software/MatrixREDUCE
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===T-profiler===
* '''Description''': T-profiler is a web-based tool that uses the t-test to score changes in the average activity of pre-defined groups of genes. The gene groups are defined based on Gene Ontology categorization, ChIP-chip experiments, upstream matches to a consensus transcription factor binding motif, and location on the same chromosome, respectively. If desired, an iterative procedure can be used to select a single, optimal representative from sets of overlapping gene groups. A jack-knife procedure is used to make calculations more robust against outliers. T-profiler makes it possible to interpret microarray data in a way that is both intuitive and statistically rigorous, without the need to combine experiments or choose parameters.
* '''Data Input''': Currently, gene expression data from Saccharomyces cerevisiae and Candida albicans are supported.
* '''Data Output''':
* '''Implementation Language''': T-profiler is written in PHP, data is managed by a MYSQL database server
* '''Version, Date, Stage''':
* '''Authors''': André Boorsma, Barrett C. Foat, Daniel Vis, Frans Klis, Harmen J. Bussemaker
* '''Platforms Tested''': Web-based application
* '''License''':
* '''Keywords''': gene expression, transcriptome, ChIP-chip, Gene Ontology
* '''URL''': http://www.t-profiler.org
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===TranscriptionDetector===
* '''Description''':A tool for finding probes measuring significantly expressed loci in a genomic array experiment. Given expression data from some tiling array experiment, TranscriptionDetector decides the likelihood that a probe is detecting transcription from the locus in which it resides. Probabilities are assigned by making use of a background signal intensity distribution from a set of negative control probes. This tool is useful for the functional annotation of genomes as it allows for the discovery of novel transcriptional units independently of any genomic annotation.
* '''Data Input''': Expression data (GEO or other platforms) and designation of which probes represent negative controls and which are data probes.
* '''Data Output''': A text file with a list of probes corresponding to significantly expressed loci.
* '''Implementation Language''': ANSI C, making use of GSL.
* '''Version, Date, Stage''':
* '''Authors''': Xiang-Jun Lu, Gabor Halasz, Marinus F. van Batenburg
* '''Platforms Tested''': Linux, Cygwin (Windows), Mac OS X
* '''License''':
* '''Keywords''': tiling arrays, expression, transcriptome
* '''URL''': http://www.bussemakerlab.org/software/TranscriptionDetector/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===PhenoGO===
* '''Description''': PhenoGO adds phenotypic contextual information to existing associations between gene products and Gene Ontology (GO) terms as specified in GO Annotations (GOA). PhenoGO utilizes an existing Natural Language Processing (NLP) system, called BioMedLEE, an existing knowledge-based phenotype organizer system (PhenOS) in conjunction with MeSH indexing and established biomedical ontologies. The system also encodes the context to identifiers that are associated in different biomedical ontologies, including the UMLS, Cell Ontology, Mouse Anatomy, NCBI taxonomy, GO, and Mammalian Phenotype Ontology. In addition, PhenoGO was evaluated for coding of anatomical and cellular information and assigning the coded phenotypes to the correct GOA; results obtained show that PhenoGO has a precision of 91% and recall of 92%, demonstrating that the PhenoGO NLP system can accurately encode a large number of anatomical and cellular ontologies to GO annotations. The PhenoGO Database may be accessed at www.phenogo.org.
* '''Data Input''': Gene Ontology Annotations Files and Medline Abstracts
* '''Data Output''': XML file and www.phenogo.org Web Portal
* '''Implementation Language''': A variety of modules, the web portal is in Java and MySQL, the computational terminology component (phenOS) is written in Perl scripts that queries tables in IBM DB2, the natural language processing component is written in PROLOG.
* '''Version, Date, Stage''': Version 2, Feb 2006
* '''Authors''': Yves Lussier and Carol Friedman are the principal investigators. The programmers are Jianrong Li, Lee Sam, and Tara Borlawsky
* '''Platforms Tested''': n/a
* '''License''': n/a
* '''Keywords''': Phenotypic integration, computational phenotypes
* '''URL''': www.phenogo.org
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===MINDY===
* '''Description''': Given a transcription factor of interest, MINDY uses a large set of gene expression profile data to identify potential post-transcriptional modulators of the transcription factor's activity. MINDY is based on a three-way statistical interaction model that captures the post-transcriptional regulatory event where the ability of a transcription factor to activate/repress its target genes is monotonically controlled by a potential modulator gene.
* '''Data Input''': Gene expression data in the EXP format, and a user-specified transcription factor of interest
* '''Data Output''': Lists of the putative modulators and target genes of the transcription factor, and the modulatory interactions involving them
* '''Implementation Language''': C++ and MATLAB, Java
* '''Version, Date, Stage''': Stable release, April 2007
* '''Authors''': Kai Wang, Ilya Nemenman, Adam Margolin, Riccardo Dalla-Favera, Andrea Califano
* '''Platforms Tested''': Linux, Cygwin
* '''License''': n/a
* '''Keywords''': gene expression, transcriptional interaction, modulator
* '''URL''': n/a
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Regulatory/Signaling network reconstruction

===B Cell Interactome===
* '''Description''': The B cell interactome (BCI) is a network of protein-protein, protein-DNA and modulatory interactions in human B cells. The network contains known interactions (reported in public databases) and predicted interactions by a Bayesian evidence integration framework which integrates a variety of generic and context specific experimental clues about protein-protein and protein-DNA interactions - such as a large collection of B cell expression profiles - with inferences from different reverse engineering algorithms, such as GeneWays and ARACNE. Modulatory interactions are predicted by MINDY, an algorithm for the prediction of modulators of transcriptional interactions.
* '''Data Input''': n/a
* '''Data Output''': text file of binary interations associated with a probability.
* '''Implementation Language''': Perl
* '''Version, Date, Stage''': Version 2, March 2007
* '''Authors''': Lefebvre C, Lim WK, Basso K, Dalla Favera R, and Califano A.
* '''Platforms Tested''': n/a
* '''License''':
* '''Keywords''': Naive Bayes, Mixed-Interaction Network, human B cells.
* '''URL''': http://amdec-bioinfo.cu-genome.org/html/BCellInteractome.html
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Interaction Modeling

===ARACNE===
* '''Description''': ARACNE is an algorithm for inferring gene regulatory networks from a set of microarray experiments. The method uses mutual information to identify genes that are co-expressed and then applies the data processing inequality to filter out interactions that are likely to be indirect.
* '''Data Input''': Text file containing measurements from a set of microarray experiments.
* '''Data Output''': Text file containing predicted interactions.
* '''Implementation Language''': C++, Java
* '''Version, Date, Stage''': Version 1, June, 2006
* '''Authors''': Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A.
* '''Platforms Tested''': Window, Linux
* '''License''': Open source
* '''Keywords''': Reverse engineering, mutual information, genetic networks, microarray
* '''URL''': http://amdec-bioinfo.cu-genome.org/html/ARACNE.htm
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Regulatory/Signaling network reconstruction

SDIWG:NCBC Software Classification MAGNet Examples

2007-05-11T22:26:52Z

Aris:

===DelPhi===
* '''Description''': DelPhi provides numerical solutions to the Poisson-Boltzmann equation (both linear and nonlinear form) for molecules of arbitrary shape and charge distribution. The current version is fast (the best relaxation parameter is estimated at run time), accurate (calculation of the electrostatic free energy is less dependent on the resolution of the lattice) and can handle extremely high lattice dimensions. It also includes flexible features for assigning different dielectric constants to different regions of space and treating systems containing mixed salt solutions.
* '''Data Input''': DelPhi takes as input a coordinate file format of a molecule or equivalent data for geometrical objects and/or charge distributions
* '''Data Output''': electrostatic potential in and around the system
* '''Implementation Language''': Fortran and C
* '''Version, Date, Stage''': Stable public release
* '''Authors''': E.Alexov, R.Fine, M.K.Gilson, A.Nicholls, W.Rocchia, K.Sharp, and B. Honig.
* '''Platforms Tested''': Unix-SGI IRIX, linux, PC (requires Fortran and C compilers), AIX IBM version and Mac.
* '''License''': Freely available to academia; pay model for commercial users.
* '''Keywords''': Finite Difference Poisson-Boltzman Solver
* '''URL''': http://trantor.bioc.columbia.edu/delphi
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Numerical Calculation of Electrostatic Potential

===GRASP===
* '''Description''': A molecular visualization and analysis program. It is particularly useful for the display and manipulation of the surfaces of molecules and their electrostatic properties.
* '''Data Input''': PDB files, potential maps from DelPhi
* '''Data Output''': molecular graphics.
* '''Implementation Language''': Fortran
* '''Version, Date, Stage''': v1.3.6 .Stable public release.
* '''Authors''': Anthony Nicholls and Barry Honig.
* '''Platforms Tested''': SGI machines: irix 5.x and 6.x (INDYs, INDIGOs including Impact, Octane and O2) systems.
* '''License''': Freely available to academia.
* '''Keywords''': molecular visualization
* '''URL''': http://trantor.bioc.columbia.edu/grasp
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Molecular Visualization Package

===Nest===
* '''Description''': Modeling protein structure based on a sequence-template alignment. The current server works only for modeling with a single template. Part of jackal, which can be downloaded.
* '''Data Input''': pir and PDB files
* '''Data Output''':
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Xiang, Z. and Honig, B.
* '''Platforms Tested''': platform independent (web based tool)
* '''License''': Freely available to academia.
* '''Keywords''': modeling, protein structure, sequence-template alignment.
* '''URL''': http://honiglab.cpmc.columbia.edu/cgi-bin/jackal/nest.cgi
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Homology Modeling

===JACKAL===
* '''Description''': Jackal is a collection of programs designed for the modeling and analysis of protein structures. Its core program is a versatile homology modeling package nest. JACKAL has the following capabilities: 1) comparative modeling based on single, composite or multiple templates; 2) side-chain prediction; 3) modeling residue mutation, insertion or deletion; 4) loop prediction; 5) structure refinement; 6) reconstruction of protein missing atoms;7) reconstruction of protein missing residues; 8) prediction of hydrogen atoms; 9) fast calculation of solvent accessible surface area; 10) structure superimposition.
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Version: 1.5 as of Oct, 20, 2002, Stable public release.
* '''Authors''': Z. Xiang and B. Honig
* '''Platforms Tested''': SGI 6.5, Intel Linux and Sun solaris
* '''License''': Freely available to academia.
* '''Keywords''': Protein Structure Modeling
* '''URL''': http://trantor.bioc.columbia.edu/programs/jackal
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Prediction of Side-chain Conformations

===GRASP2===
* '''Description''': GRASP2 is an updated version of the GRASP program used for macromolecular structure and surface visualization, contains a large number of new features and scientific tools: Enhanced GUI; Structure alignment and domain database scanning; A gaussian surface generator and new surface coloring schemes; Sequence visualization and alignment; Completed work can be stored in "project files; Among the many objects that can be stored in a project file are views of the structure; defined subsets, surfaces; Direct printing to printers at full printer resolution.
* '''Data Input''': PDB files, potential maps from DelPhi, sequence alignments.
* '''Data Output''': molecular graphics, structural alignments.
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Stable public release
* '''Authors''': Donald Petrey and Barry Honig.
* '''Platforms Tested''': Windows, Linux
* '''License''': Freely available to academia.
* '''Keywords''': molecular visualization
* '''URL''': http://trantor.bioc.columbia.edu/grasp2
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': SoftwareFunction --> Protein Modeling and Classification --> Molecular Visualization Package

===PrISM===
* '''Description''': PrISM is an integrated computational system where computational tools are implemented for protein sequence and structure analysis and modeling.
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''': Fortran
* '''Version, Date, Stage''': Stable public release
* '''Authors''': Wang, L, Yang, A. S. & Honig, B.
* '''Platforms Tested''':SGI-irix, Intel-linux
* '''License''': Freely available to academia.
* '''Keywords''': protein analysis/modeling
* '''URL''': http://trantor.bioc.columbia.edu/programs/PrISM/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Homology Modeling

===Protein-DNA interface alignment===
* '''Description''': The protein-DNA alignment software allows one to align the interfacial amino acids from two protein-DNA complexes based on the geometric relationship of each amino acid to its local DNA.
* '''Data Input''': two PDB files that both contain protein-DNA complexes
* '''Data Output''': The programs will output the aligned residues and their corresponding residue-residue similarity scores, s(i,j).
* '''Implementation Language''': C++ and Perl
* '''Version, Date, Stage''':Stable public release.
* '''Authors''': Siggers, T.W., Silkov, A & Honig, B.
* '''Platforms Tested''': Linux
* '''License''': Freely available to academia
* '''Keywords''': protein-DNA interface
* '''URL''': http://trantor.bioc.columbia.edu/programs/intfc_aln
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Prediction of Side-chain Conformations

===SURFace===
* '''Description''': SURFace algorithms are programs that calculate solvent accessible surface area and curvature corrected solvent accessible surface area
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''':
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Nicholls, A., Sharp, K., Sridharan, S. and Honig, B.
* '''Platforms Tested''': SGI
* '''License''': Freely available to academia.
* '''Keywords''': solvent accessible surface area
* '''URL''': http://trantor.bioc.columbia.edu/surf/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Caculation of Solvent Accessible Area

===Target Explorer===
* '''Description''': Automated process of prediction of complex regulatory elements for specified set of transcription factors in Drosophila melanogaster genome. Target Explorer is a complex tool with user-friendly self-explanatory Web-interface that allows to user: 1. create customized library of TF binding site matrices based on user defined sets of training sequences; 2. search for new clusters of binding sites for specified set of TFs; 3.extract annotation for potential target genes.
* '''Data Input''': genomic sequences
* '''Data Output''': clusters of known binding sites
* '''Implementation Language''': perl, cgi
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Sosinsky A, Bonin CP, Mann RS, Honig B.
* '''Platforms Tested''': platform independent (web based tool)
* '''License''': Freely available to academia.
* '''Keywords''': prediction of binding sites for transcription factors
* '''URL''': http://trantor.bioc.columbia.edu/Target_Explorer/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis--> Sequence Annotation

===MEDUSA and Gorgon===
* '''Description''': MEDUSA is an algorithm for learning predictive models of transcriptional gene regulation from gene expression and promoter sequence data. By using a statistical learning approach based on boosing, MEDUSA learns cis regulatory motifs, condition-specific regulators, and regulatory programs that predict the differential expression of target genes. The regulatory program is specified as an alternating decision tree (ADT). The Java implementation of MEDUSA will allow a number of visualizations of the regulatory program and other inferred regulatory information, implemented in the accompanying Gorgon tool, including hits of significant and condition-specific motifs along the promoter sequences of target genes and regulatory network figures viewable in Cytoscape.
* '''Data Input''': Discretized (up/down/baseline) gene expression data in plain text format, promoter sequences in FASTA format, list of candidate transcriptional regulators and signal transducers in plain text format.
* '''Data Output''': Regulatory program represented as a Java serialized object file readable by Gorgon and as a human readable XML file. Gorgon currently generates views of learned PSSMs, positional hits along promoter sequences, and views of the ADT as HTML files, and generates network figures as Cytoscape format files.
* '''Implementation Language''': Java (prototyped in MATLAB)
* '''Version, Date, Stage''': Version 2.0, July 2006, pre-release beta version; Version 1.0 (MATLAB), April 2005, stable public release
* '''Authors''': David Quigley, Manuel Middendorf, Steve Lianoglou, Anshul Kundaje, Yoav Freund, Chris Wiggins, Christina Leslie
* '''Platforms Tested''': Windows, Linux, Mac OS X
* '''License''': Open source
* '''Keywords''':
* '''URL''': http://www.cs.columbia.edu/compbio/medusa (MATLAB),http://compbio.sytes.net:8090/medusa (Java beta version)
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===String kernel package===
* '''Description''':The string kernel package contains implementations for the mismatch and profile string kernels for use with support vector machine (SVM) classifiers for protein sequence classification. Both kernels compute similarity between protein sequences based on common occurrences of k-length subsequences ("k-mers") counted with substitutions. Kernel functions for protein sequence data enable the training of SVMs for a range of prediction problems, in particular protein structural class prediction and remote homology detection. A version of the Spider MATLAB machine learning package is also bundled with the code, which allows users to train SVMs and evaluate performance on test sets with the packaged software.
* '''Data Input''': The mismatch kernel requires sequence data in FASTA format. The profile string kernel uses probabilistic profiles, such as those produced by PSI-BLAST, in place of the original sequences. The Spider SVM implementation requires both the kernel matrix and a label file of binary or multi-class labels for the training data; this data must be loaded into MATLAB variables before using Spider routing.
* '''Data Output''':The kernel code produces a kernel matrix for the input data in tab-delimited text format. The Spider package trains SVMs and stores the learns classifier and results from applying the classifier on test data as MATLAB objects.
* '''Implementation Language''': String kernel code is implemented in C. Spider is a set of object-oriented MATLAB routines.
* '''Version, Date, Stage''': Version 1.2, September 2004, stable public release
* '''Authors''': Eleazar Eskin, Rui Kuang, Eugene Ie, Ke Wang, Jason Weston, Bill Noble, Christina Leslie
* '''Platforms Tested''': Windows, Linux
* '''License''': Open source
* '''Keywords''':
* '''URL''': http://www.cs.columbia.edu/compbio/string-kernels
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===MatrixREDUCE===
* '''Description''': Regulation of gene expression by a transcription factor requires physical interaction between the factor and the DNA, which can be described by astatistical mechanical model. Based on this model, the MatrixREDUCE algorithm uses genome-wide occupancy data for a transcription factor (e.g.ChIP-chip or mRNA expression data) and associated nucleotide sequences to discover the sequence-specific binding affinity of the transcription factor. The sequence specificity of the transcription factor's DNA-binding domain is modeled using a position-specific affinity matrix (PSAM), representing the change in the binding affinity (Kd) whenever a specific position within a reference binding sequence is mutated. The PSAM can be transformed into affinity logo for visualization using the utility program AffinityLogo, and a MatrixREDUCE run can be summarized in an easy-to-navigate webpage using HTMLSummary.
* '''Data Input''': sequence file in FASTA format; and expression data file in tab-delimited text format.
* '''Data Output''': PSAMs in numeric and graphical format, parameters of the fitted model, and an HTML summary page.
* '''Implementation Language''': ANSI C, making use of Numerical Recipes routines.
* '''Version, Date, Stage''': Version 1.0, July 10, 2006, extensively tested in lab.
* '''Authors''': Barrett Foat, Xiang-Jun Lu, Harmen J. Bussemaker
* '''Platforms Tested''': Linux, Cygwin (Windows), Mac OS X
* '''License''':
* '''Keywords''': position-specific affinity matrix, binding affinity, cis-regulatory element, expression data, ChIP-chip, transcription factor
* '''URL''': http://www.bussemakerlab.org/software/MatrixREDUCE
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===T-profiler===
* '''Description''': T-profiler is a web-based tool that uses the t-test to score changes in the average activity of pre-defined groups of genes. The gene groups are defined based on Gene Ontology categorization, ChIP-chip experiments, upstream matches to a consensus transcription factor binding motif, and location on the same chromosome, respectively. If desired, an iterative procedure can be used to select a single, optimal representative from sets of overlapping gene groups. A jack-knife procedure is used to make calculations more robust against outliers. T-profiler makes it possible to interpret microarray data in a way that is both intuitive and statistically rigorous, without the need to combine experiments or choose parameters.
* '''Data Input''': Currently, gene expression data from Saccharomyces cerevisiae and Candida albicans are supported.
* '''Data Output''':
* '''Implementation Language''': T-profiler is written in PHP, data is managed by a MYSQL database server
* '''Version, Date, Stage''':
* '''Authors''': André Boorsma, Barrett C. Foat, Daniel Vis, Frans Klis, Harmen J. Bussemaker
* '''Platforms Tested''': Web-based application
* '''License''':
* '''Keywords''': gene expression, transcriptome, ChIP-chip, Gene Ontology
* '''URL''': http://www.t-profiler.org
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===TranscriptionDetector===
* '''Description''':A tool for finding probes measuring significantly expressed loci in a genomic array experiment. Given expression data from some tiling array experiment, TranscriptionDetector decides the likelihood that a probe is detecting transcription from the locus in which it resides. Probabilities are assigned by making use of a background signal intensity distribution from a set of negative control probes. This tool is useful for the functional annotation of genomes as it allows for the discovery of novel transcriptional units independently of any genomic annotation.
* '''Data Input''': Expression data (GEO or other platforms) and designation of which probes represent negative controls and which are data probes.
* '''Data Output''': A text file with a list of probes corresponding to significantly expressed loci.
* '''Implementation Language''': ANSI C, making use of GSL.
* '''Version, Date, Stage''':
* '''Authors''': Xiang-Jun Lu, Gabor Halasz, Marinus F. van Batenburg
* '''Platforms Tested''': Linux, Cygwin (Windows), Mac OS X
* '''License''':
* '''Keywords''': tiling arrays, expression, transcriptome
* '''URL''': http://www.bussemakerlab.org/software/TranscriptionDetector/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===PhenoGO===
* '''Description''': PhenoGO adds phenotypic contextual information to existing associations between gene products and Gene Ontology (GO) terms as specified in GO Annotations (GOA). PhenoGO utilizes an existing Natural Language Processing (NLP) system, called BioMedLEE, an existing knowledge-based phenotype organizer system (PhenOS) in conjunction with MeSH indexing and established biomedical ontologies. The system also encodes the context to identifiers that are associated in different biomedical ontologies, including the UMLS, Cell Ontology, Mouse Anatomy, NCBI taxonomy, GO, and Mammalian Phenotype Ontology. In addition, PhenoGO was evaluated for coding of anatomical and cellular information and assigning the coded phenotypes to the correct GOA; results obtained show that PhenoGO has a precision of 91% and recall of 92%, demonstrating that the PhenoGO NLP system can accurately encode a large number of anatomical and cellular ontologies to GO annotations. The PhenoGO Database may be accessed at www.phenogo.org.
* '''Data Input''': Gene Ontology Annotations Files and Medline Abstracts
* '''Data Output''': XML file and www.phenogo.org Web Portal
* '''Implementation Language''': A variety of modules, the web portal is in Java and MySQL, the computational terminology component (phenOS) is written in Perl scripts that queries tables in IBM DB2, the natural language processing component is written in PROLOG.
* '''Version, Date, Stage''': Version 2, Feb 2006
* '''Authors''': Yves Lussier and Carol Friedman are the principal investigators. The programmers are Jianrong Li, Lee Sam, and Tara Borlawsky
* '''Platforms Tested''': n/a
* '''License''': n/a
* '''Keywords''': Phenotypic integration, computational phenotypes
* '''URL''': www.phenogo.org
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===MINDY===
* '''Description''': Given a transcription factor of interest, MINDY uses a large set of gene expression profile data to identify potential post-transcriptional modulators of the transcription factor's activity. MINDY is based on a three-way statistical interaction model that captures the post-transcriptional regulatory event where the ability of a transcription factor to activate/repress its target genes is monotonically controlled by a potential modulator gene.
* '''Data Input''': Gene expression data in the EXP format, and a user-specified transcription factor of interest
* '''Data Output''': Lists of the putative modulators and target genes of the transcription factor, and the modulatory interactions involving them
* '''Implementation Language''': C++ and MATLAB, Java
* '''Version, Date, Stage''': Stable release, April 2007
* '''Authors''': Kai Wang, Ilya Nemenman, Adam Margolin, Riccardo Dalla-Favera, Andrea Califano
* '''Platforms Tested''': Linux, Cygwin
* '''License''': n/a
* '''Keywords''': gene expression, transcriptional interaction, modulator
* '''URL''': n/a
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis --> Regulatory/Signaling network reconstruction

===B Cell Interactome===
* '''Description''': The B cell interactome (BCI) is a network of protein-protein, protein-DNA and modulatory interactions in human B cells. The network contains known interactions (reported in public databases) and predicted interactions by a Bayesian evidence integration framework which integrates a variety of generic and context specific experimental clues about protein-protein and protein-DNA interactions - such as a large collection of B cell expression profiles - with inferences from different reverse engineering algorithms, such as GeneWays and ARACNE. Modulatory interactions are predicted by MINDY, an algorithm for the prediction of modulators of transcriptional interactions.
* '''Data Input''': n/a
* '''Data Output''': text file of binary interations associated with a probability.
* '''Implementation Language''': Perl
* '''Version, Date, Stage''': Version 2, March 2007
* '''Authors''': Lefebvre C, Lim WK, Basso K, Dalla Favera R, and Califano A.
* '''Platforms Tested''': n/a
* '''License''':
* '''Keywords''': Naive Bayes, Mixed-Interaction Network, human B cells.
* '''URL''': http://amdec-bioinfo.cu-genome.org/html/BCellInteractome.html
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Interaction Modeling

SDIWG:NCBC Software Classification MAGNet Examples

2007-05-11T04:22:10Z

Aris:

===DelPhi===
* '''Description''': DelPhi provides numerical solutions to the Poisson-Boltzmann equation (both linear and nonlinear form) for molecules of arbitrary shape and charge distribution. The current version is fast (the best relaxation parameter is estimated at run time), accurate (calculation of the electrostatic free energy is less dependent on the resolution of the lattice) and can handle extremely high lattice dimensions. It also includes flexible features for assigning different dielectric constants to different regions of space and treating systems containing mixed salt solutions.
* '''Data Input''': DelPhi takes as input a coordinate file format of a molecule or equivalent data for geometrical objects and/or charge distributions
* '''Data Output''': electrostatic potential in and around the system
* '''Implementation Language''': Fortran and C
* '''Version, Date, Stage''': Stable public release
* '''Authors''': E.Alexov, R.Fine, M.K.Gilson, A.Nicholls, W.Rocchia, K.Sharp, and B. Honig.
* '''Platforms Tested''': Unix-SGI IRIX, linux, PC (requires Fortran and C compilers), AIX IBM version and Mac.
* '''License''': Freely available to academia; pay model for commercial users.
* '''Keywords''': Finite Difference Poisson-Boltzman Solver
* '''URL''': http://trantor.bioc.columbia.edu/delphi
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Numerical Calculation of Electrostatic Potential

===GRASP===
* '''Description''': A molecular visualization and analysis program. It is particularly useful for the display and manipulation of the surfaces of molecules and their electrostatic properties.
* '''Data Input''': PDB files, potential maps from DelPhi
* '''Data Output''': molecular graphics.
* '''Implementation Language''': Fortran
* '''Version, Date, Stage''': v1.3.6 .Stable public release.
* '''Authors''': Anthony Nicholls and Barry Honig.
* '''Platforms Tested''': SGI machines: irix 5.x and 6.x (INDYs, INDIGOs including Impact, Octane and O2) systems.
* '''License''': Freely available to academia.
* '''Keywords''': molecular visualization
* '''URL''': http://trantor.bioc.columbia.edu/grasp
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Molecular Visualization Package

===Nest===
* '''Description''': Modeling protein structure based on a sequence-template alignment. The current server works only for modeling with a single template. Part of jackal, which can be downloaded.
* '''Data Input''': pir and PDB files
* '''Data Output''':
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Xiang, Z. and Honig, B.
* '''Platforms Tested''': platform independent (web based tool)
* '''License''': Freely available to academia.
* '''Keywords''': modeling, protein structure, sequence-template alignment.
* '''URL''': http://honiglab.cpmc.columbia.edu/cgi-bin/jackal/nest.cgi
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Homology Modeling

===JACKAL===
* '''Description''': Jackal is a collection of programs designed for the modeling and analysis of protein structures. Its core program is a versatile homology modeling package nest. JACKAL has the following capabilities: 1) comparative modeling based on single, composite or multiple templates; 2) side-chain prediction; 3) modeling residue mutation, insertion or deletion; 4) loop prediction; 5) structure refinement; 6) reconstruction of protein missing atoms;7) reconstruction of protein missing residues; 8) prediction of hydrogen atoms; 9) fast calculation of solvent accessible surface area; 10) structure superimposition.
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Version: 1.5 as of Oct, 20, 2002, Stable public release.
* '''Authors''': Z. Xiang and B. Honig
* '''Platforms Tested''': SGI 6.5, Intel Linux and Sun solaris
* '''License''': Freely available to academia.
* '''Keywords''': Protein Structure Modeling
* '''URL''': http://trantor.bioc.columbia.edu/programs/jackal
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Prediction of Side-chain Conformations

===GRASP2===
* '''Description''': GRASP2 is an updated version of the GRASP program used for macromolecular structure and surface visualization, contains a large number of new features and scientific tools: Enhanced GUI; Structure alignment and domain database scanning; A gaussian surface generator and new surface coloring schemes; Sequence visualization and alignment; Completed work can be stored in "project files; Among the many objects that can be stored in a project file are views of the structure; defined subsets, surfaces; Direct printing to printers at full printer resolution.
* '''Data Input''': PDB files, potential maps from DelPhi, sequence alignments.
* '''Data Output''': molecular graphics, structural alignments.
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Stable public release
* '''Authors''': Donald Petrey and Barry Honig.
* '''Platforms Tested''': Windows, Linux
* '''License''': Freely available to academia.
* '''Keywords''': molecular visualization
* '''URL''': http://trantor.bioc.columbia.edu/grasp2
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': SoftwareFunction --> Protein Modeling and Classification --> Molecular Visualization Package

===PrISM===
* '''Description''': PrISM is an integrated computational system where computational tools are implemented for protein sequence and structure analysis and modeling.
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''': Fortran
* '''Version, Date, Stage''': Stable public release
* '''Authors''': Wang, L, Yang, A. S. & Honig, B.
* '''Platforms Tested''':SGI-irix, Intel-linux
* '''License''': Freely available to academia.
* '''Keywords''': protein analysis/modeling
* '''URL''': http://trantor.bioc.columbia.edu/programs/PrISM/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Homology Modeling

===Protein-DNA interface alignment===
* '''Description''': The protein-DNA alignment software allows one to align the interfacial amino acids from two protein-DNA complexes based on the geometric relationship of each amino acid to its local DNA.
* '''Data Input''': two PDB files that both contain protein-DNA complexes
* '''Data Output''': The programs will output the aligned residues and their corresponding residue-residue similarity scores, s(i,j).
* '''Implementation Language''': C++ and Perl
* '''Version, Date, Stage''':Stable public release.
* '''Authors''': Siggers, T.W., Silkov, A & Honig, B.
* '''Platforms Tested''': Linux
* '''License''': Freely available to academia
* '''Keywords''': protein-DNA interface
* '''URL''': http://trantor.bioc.columbia.edu/programs/intfc_aln
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Prediction of Side-chain Conformations

===SURFace===
* '''Description''': SURFace algorithms are programs that calculate solvent accessible surface area and curvature corrected solvent accessible surface area
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''':
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Nicholls, A., Sharp, K., Sridharan, S. and Honig, B.
* '''Platforms Tested''': SGI
* '''License''': Freely available to academia.
* '''Keywords''': solvent accessible surface area
* '''URL''': http://trantor.bioc.columbia.edu/surf/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Protein Modeling and Classification --> Caculation of Solvent Accessible Area

===Target Explorer===
* '''Description''': Automated process of prediction of complex regulatory elements for specified set of transcription factors in Drosophila melanogaster genome. Target Explorer is a complex tool with user-friendly self-explanatory Web-interface that allows to user: 1. create customized library of TF binding site matrices based on user defined sets of training sequences; 2. search for new clusters of binding sites for specified set of TFs; 3.extract annotation for potential target genes.
* '''Data Input''': genomic sequences
* '''Data Output''': clusters of known binding sites
* '''Implementation Language''': perl, cgi
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Sosinsky A, Bonin CP, Mann RS, Honig B.
* '''Platforms Tested''': platform independent (web based tool)
* '''License''': Freely available to academia.
* '''Keywords''': prediction of binding sites for transcription factors
* '''URL''': http://trantor.bioc.columbia.edu/Target_Explorer/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''': Atomic --> SoftwareFunction --> Genomic & Phenotypic Analysis--> Sequence Annotation

===MEDUSA and Gorgon===
* '''Description''': MEDUSA is an algorithm for learning predictive models of transcriptional gene regulation from gene expression and promoter sequence data. By using a statistical learning approach based on boosing, MEDUSA learns cis regulatory motifs, condition-specific regulators, and regulatory programs that predict the differential expression of target genes. The regulatory program is specified as an alternating decision tree (ADT). The Java implementation of MEDUSA will allow a number of visualizations of the regulatory program and other inferred regulatory information, implemented in the accompanying Gorgon tool, including hits of significant and condition-specific motifs along the promoter sequences of target genes and regulatory network figures viewable in Cytoscape.
* '''Data Input''': Discretized (up/down/baseline) gene expression data in plain text format, promoter sequences in FASTA format, list of candidate transcriptional regulators and signal transducers in plain text format.
* '''Data Output''': Regulatory program represented as a Java serialized object file readable by Gorgon and as a human readable XML file. Gorgon currently generates views of learned PSSMs, positional hits along promoter sequences, and views of the ADT as HTML files, and generates network figures as Cytoscape format files.
* '''Implementation Language''': Java (prototyped in MATLAB)
* '''Version, Date, Stage''': Version 2.0, July 2006, pre-release beta version; Version 1.0 (MATLAB), April 2005, stable public release
* '''Authors''': David Quigley, Manuel Middendorf, Steve Lianoglou, Anshul Kundaje, Yoav Freund, Chris Wiggins, Christina Leslie
* '''Platforms Tested''': Windows, Linux, Mac OS X
* '''License''': Open source
* '''Keywords''':
* '''URL''': http://www.cs.columbia.edu/compbio/medusa (MATLAB),http://compbio.sytes.net:8090/medusa (Java beta version)
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===String kernel package===
* '''Description''':The string kernel package contains implementations for the mismatch and profile string kernels for use with support vector machine (SVM) classifiers for protein sequence classification. Both kernels compute similarity between protein sequences based on common occurrences of k-length subsequences ("k-mers") counted with substitutions. Kernel functions for protein sequence data enable the training of SVMs for a range of prediction problems, in particular protein structural class prediction and remote homology detection. A version of the Spider MATLAB machine learning package is also bundled with the code, which allows users to train SVMs and evaluate performance on test sets with the packaged software.
* '''Data Input''': The mismatch kernel requires sequence data in FASTA format. The profile string kernel uses probabilistic profiles, such as those produced by PSI-BLAST, in place of the original sequences. The Spider SVM implementation requires both the kernel matrix and a label file of binary or multi-class labels for the training data; this data must be loaded into MATLAB variables before using Spider routing.
* '''Data Output''':The kernel code produces a kernel matrix for the input data in tab-delimited text format. The Spider package trains SVMs and stores the learns classifier and results from applying the classifier on test data as MATLAB objects.
* '''Implementation Language''': String kernel code is implemented in C. Spider is a set of object-oriented MATLAB routines.
* '''Version, Date, Stage''': Version 1.2, September 2004, stable public release
* '''Authors''': Eleazar Eskin, Rui Kuang, Eugene Ie, Ke Wang, Jason Weston, Bill Noble, Christina Leslie
* '''Platforms Tested''': Windows, Linux
* '''License''': Open source
* '''Keywords''':
* '''URL''': http://www.cs.columbia.edu/compbio/string-kernels
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===MatrixREDUCE===
* '''Description''': Regulation of gene expression by a transcription factor requires physical interaction between the factor and the DNA, which can be described by astatistical mechanical model. Based on this model, the MatrixREDUCE algorithm uses genome-wide occupancy data for a transcription factor (e.g.ChIP-chip or mRNA expression data) and associated nucleotide sequences to discover the sequence-specific binding affinity of the transcription factor. The sequence specificity of the transcription factor's DNA-binding domain is modeled using a position-specific affinity matrix (PSAM), representing the change in the binding affinity (Kd) whenever a specific position within a reference binding sequence is mutated. The PSAM can be transformed into affinity logo for visualization using the utility program AffinityLogo, and a MatrixREDUCE run can be summarized in an easy-to-navigate webpage using HTMLSummary.
* '''Data Input''': sequence file in FASTA format; and expression data file in tab-delimited text format.
* '''Data Output''': PSAMs in numeric and graphical format, parameters of the fitted model, and an HTML summary page.
* '''Implementation Language''': ANSI C, making use of Numerical Recipes routines.
* '''Version, Date, Stage''': Version 1.0, July 10, 2006, extensively tested in lab.
* '''Authors''': Barrett Foat, Xiang-Jun Lu, Harmen J. Bussemaker
* '''Platforms Tested''': Linux, Cygwin (Windows), Mac OS X
* '''License''':
* '''Keywords''': position-specific affinity matrix, binding affinity, cis-regulatory element, expression data, ChIP-chip, transcription factor
* '''URL''': http://www.bussemakerlab.org/software/MatrixREDUCE
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===T-profiler===
* '''Description''': T-profiler is a web-based tool that uses the t-test to score changes in the average activity of pre-defined groups of genes. The gene groups are defined based on Gene Ontology categorization, ChIP-chip experiments, upstream matches to a consensus transcription factor binding motif, and location on the same chromosome, respectively. If desired, an iterative procedure can be used to select a single, optimal representative from sets of overlapping gene groups. A jack-knife procedure is used to make calculations more robust against outliers. T-profiler makes it possible to interpret microarray data in a way that is both intuitive and statistically rigorous, without the need to combine experiments or choose parameters.
* '''Data Input''': Currently, gene expression data from Saccharomyces cerevisiae and Candida albicans are supported.
* '''Data Output''':
* '''Implementation Language''': T-profiler is written in PHP, data is managed by a MYSQL database server
* '''Version, Date, Stage''':
* '''Authors''': André Boorsma, Barrett C. Foat, Daniel Vis, Frans Klis, Harmen J. Bussemaker
* '''Platforms Tested''': Web-based application
* '''License''':
* '''Keywords''': gene expression, transcriptome, ChIP-chip, Gene Ontology
* '''URL''': http://www.t-profiler.org
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===TranscriptionDetector===
* '''Description''':A tool for finding probes measuring significantly expressed loci in a genomic array experiment. Given expression data from some tiling array experiment, TranscriptionDetector decides the likelihood that a probe is detecting transcription from the locus in which it resides. Probabilities are assigned by making use of a background signal intensity distribution from a set of negative control probes. This tool is useful for the functional annotation of genomes as it allows for the discovery of novel transcriptional units independently of any genomic annotation.
* '''Data Input''': Expression data (GEO or other platforms) and designation of which probes represent negative controls and which are data probes.
* '''Data Output''': A text file with a list of probes corresponding to significantly expressed loci.
* '''Implementation Language''': ANSI C, making use of GSL.
* '''Version, Date, Stage''':
* '''Authors''': Xiang-Jun Lu, Gabor Halasz, Marinus F. van Batenburg
* '''Platforms Tested''': Linux, Cygwin (Windows), Mac OS X
* '''License''':
* '''Keywords''': tiling arrays, expression, transcriptome
* '''URL''': http://www.bussemakerlab.org/software/TranscriptionDetector/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===PhenoGO===
* '''Description''': PhenoGO adds phenotypic contextual information to existing associations between gene products and Gene Ontology (GO) terms as specified in GO Annotations (GOA). PhenoGO utilizes an existing Natural Language Processing (NLP) system, called BioMedLEE, an existing knowledge-based phenotype organizer system (PhenOS) in conjunction with MeSH indexing and established biomedical ontologies. The system also encodes the context to identifiers that are associated in different biomedical ontologies, including the UMLS, Cell Ontology, Mouse Anatomy, NCBI taxonomy, GO, and Mammalian Phenotype Ontology. In addition, PhenoGO was evaluated for coding of anatomical and cellular information and assigning the coded phenotypes to the correct GOA; results obtained show that PhenoGO has a precision of 91% and recall of 92%, demonstrating that the PhenoGO NLP system can accurately encode a large number of anatomical and cellular ontologies to GO annotations. The PhenoGO Database may be accessed at www.phenogo.org.
* '''Data Input''': Gene Ontology Annotations Files and Medline Abstracts
* '''Data Output''': XML file and www.phenogo.org Web Portal
* '''Implementation Language''': A variety of modules, the web portal is in Java and MySQL, the computational terminology component (phenOS) is written in Perl scripts that queries tables in IBM DB2, the natural language processing component is written in PROLOG.
* '''Version, Date, Stage''': Version 2, Feb 2006
* '''Authors''': Yves Lussier and Carol Friedman are the principal investigators. The programmers are Jianrong Li, Lee Sam, and Tara Borlawsky
* '''Platforms Tested''': n/a
* '''License''': n/a
* '''Keywords''': Phenotypic integration, computational phenotypes
* '''URL''': www.phenogo.org
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

SDIWG:NCBC Software Classification MAGNet Examples

2007-05-11T03:38:21Z

Aris:

===DelPhi===
* '''Description''': DelPhi provides numerical solutions to the Poisson-Boltzmann equation (both linear and nonlinear form) for molecules of arbitrary shape and charge distribution. The current version is fast (the best relaxation parameter is estimated at run time), accurate (calculation of the electrostatic free energy is less dependent on the resolution of the lattice) and can handle extremely high lattice dimensions. It also includes flexible features for assigning different dielectric constants to different regions of space and treating systems containing mixed salt solutions.
* '''Data Input''': DelPhi takes as input a coordinate file format of a molecule or equivalent data for geometrical objects and/or charge distributions
* '''Data Output''': electrostatic potential in and around the system
* '''Implementation Language''': Fortran and C
* '''Version, Date, Stage''': Stable public release
* '''Authors''': E.Alexov, R.Fine, M.K.Gilson, A.Nicholls, W.Rocchia, K.Sharp, and B. Honig.
* '''Platforms Tested''': Unix-SGI IRIX, linux, PC (requires Fortran and C compilers), AIX IBM version and Mac.
* '''License''': Freely available to academia; pay model for commercial users.
* '''Keywords''': Finite Difference Poisson-Boltzman Solver
* '''URL''': http://trantor.bioc.columbia.edu/delphi
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===GRASP===
* '''Description''': A molecular visualization and analysis program. It is particularly useful for the display and manipulation of the surfaces of molecules and their electrostatic properties.
* '''Data Input''': PDB files, potential maps from DelPhi
* '''Data Output''': molecular graphics.
* '''Implementation Language''': Fortran
* '''Version, Date, Stage''': v1.3.6 .Stable public release.
* '''Authors''': Anthony Nicholls and Barry Honig.
* '''Platforms Tested''': SGI machines: irix 5.x and 6.x (INDYs, INDIGOs including Impact, Octane and O2) systems.
* '''License''': Freely available to academia.
* '''Keywords''': molecular visualization
* '''URL''': http://trantor.bioc.columbia.edu/grasp
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===Nest===
* '''Description''': Modeling protein structure based on a sequence-template alignment. The current server works only for modeling with a single template. Part of jackal, which can be downloaded.
* '''Data Input''': pir and PDB files
* '''Data Output''':
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Xiang, Z. and Honig, B.
* '''Platforms Tested''': platform independent (web based tool)
* '''License''': Freely available to academia.
* '''Keywords''': modeling, protein structure, sequence-template alignment.
* '''URL''': http://honiglab.cpmc.columbia.edu/cgi-bin/jackal/nest.cgi
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===JACKAL===
* '''Description''': Jackal is a collection of programs designed for the modeling and analysis of protein structures. Its core program is a versatile homology modeling package nest. JACKAL has the following capabilities: 1) comparative modeling based on single, composite or multiple templates; 2) side-chain prediction; 3) modeling residue mutation, insertion or deletion; 4) loop prediction; 5) structure refinement; 6) reconstruction of protein missing atoms;7) reconstruction of protein missing residues; 8) prediction of hydrogen atoms; 9) fast calculation of solvent accessible surface area; 10) structure superimposition.
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Version: 1.5 as of Oct, 20, 2002, Stable public release.
* '''Authors''': Z. Xiang and B. Honig
* '''Platforms Tested''': SGI 6.5, Intel Linux and Sun solaris
* '''License''': Freely available to academia.
* '''Keywords''': Protein Structure Modeling
* '''URL''': http://trantor.bioc.columbia.edu/programs/jackal
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===GRASP2===
* '''Description''': GRASP2 is an updated version of the GRASP program used for macromolecular structure and surface visualization, contains a large number of new features and scientific tools: Enhanced GUI; Structure alignment and domain database scanning; A gaussian surface generator and new surface coloring schemes; Sequence visualization and alignment; Completed work can be stored in "project files; Among the many objects that can be stored in a project file are views of the structure; defined subsets, surfaces; Direct printing to printers at full printer resolution.
* '''Data Input''': PDB files, potential maps from DelPhi, sequence alignments.
* '''Data Output''': molecular graphics, structural alignments.
* '''Implementation Language''': C++
* '''Version, Date, Stage''': Stable public release
* '''Authors''': Donald Petrey and Barry Honig.
* '''Platforms Tested''': Windows, Linux
* '''License''': Freely available to academia.
* '''Keywords''': molecular visualization
* '''URL''': http://trantor.bioc.columbia.edu/grasp2
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===PrISM===
* '''Description''': PrISM is an integrated computational system where computational tools are implemented for protein sequence and structure analysis and modeling.
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''': Fortran
* '''Version, Date, Stage''': Stable public release
* '''Authors''': Wang, L, Yang, A. S. & Honig, B.
* '''Platforms Tested''':SGI-irix, Intel-linux
* '''License''': Freely available to academia.
* '''Keywords''': protein analysis/modeling
* '''URL''': http://trantor.bioc.columbia.edu/programs/PrISM/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===Protein-DNA interface alignment===
* '''Description''': The protein-DNA alignment software allows one to align the interfacial amino acids from two protein-DNA complexes based on the geometric relationship of each amino acid to its local DNA.
* '''Data Input''': two PDB files that both contain protein-DNA complexes
* '''Data Output''': The programs will output the aligned residues and their corresponding residue-residue similarity scores, s(i,j).
* '''Implementation Language''': C++ and Perl
* '''Version, Date, Stage''':Stable public release.
* '''Authors''': Siggers, T.W., Silkov, A & Honig, B.
* '''Platforms Tested''': Linux
* '''License''': Freely available to academia
* '''Keywords''': protein-DNA interface
* '''URL''': http://trantor.bioc.columbia.edu/programs/intfc_aln
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===SURFace===
* '''Description''': SURFace algorithms are programs that calculate solvent accessible surface area and curvature corrected solvent accessible surface area
* '''Data Input''':
* '''Data Output''':
* '''Implementation Language''':
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Nicholls, A., Sharp, K., Sridharan, S. and Honig, B.
* '''Platforms Tested''': SGI
* '''License''': Freely available to academia.
* '''Keywords''': solvent accessible surface area
* '''URL''': http://trantor.bioc.columbia.edu/surf/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===Target Explorer===
* '''Description''': Automated process of prediction of complex regulatory elements for specified set of transcription factors in Drosophila melanogaster genome. Target Explorer is a complex tool with user-friendly self-explanatory Web-interface that allows to user: 1. create customized library of TF binding site matrices based on user defined sets of training sequences; 2. search for new clusters of binding sites for specified set of TFs; 3.extract annotation for potential target genes.
* '''Data Input''': genomic sequences
* '''Data Output''': clusters of known binding sites
* '''Implementation Language''': perl, cgi
* '''Version, Date, Stage''': Stable public release.
* '''Authors''': Sosinsky A, Bonin CP, Mann RS, Honig B.
* '''Platforms Tested''': platform independent (web based tool)
* '''License''': Freely available to academia.
* '''Keywords''': prediction of binding sites for transcription factors
* '''URL''': http://trantor.bioc.columbia.edu/Target_Explorer/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===MEDUSA and Gorgon===
* '''Description''': MEDUSA is an algorithm for learning predictive models of transcriptional gene regulation from gene expression and promoter sequence data. By using a statistical learning approach based on boosing, MEDUSA learns cis regulatory motifs, condition-specific regulators, and regulatory programs that predict the differential expression of target genes. The regulatory program is specified as an alternating decision tree (ADT). The Java implementation of MEDUSA will allow a number of visualizations of the regulatory program and other inferred regulatory information, implemented in the accompanying Gorgon tool, including hits of significant and condition-specific motifs along the promoter sequences of target genes and regulatory network figures viewable in Cytoscape.
* '''Data Input''': Discretized (up/down/baseline) gene expression data in plain text format, promoter sequences in FASTA format, list of candidate transcriptional regulators and signal transducers in plain text format.
* '''Data Output''': Regulatory program represented as a Java serialized object file readable by Gorgon and as a human readable XML file. Gorgon currently generates views of learned PSSMs, positional hits along promoter sequences, and views of the ADT as HTML files, and generates network figures as Cytoscape format files.
* '''Implementation Language''': Java (prototyped in MATLAB)
* '''Version, Date, Stage''': Version 2.0, July 2006, pre-release beta version; Version 1.0 (MATLAB), April 2005, stable public release
* '''Authors''': David Quigley, Manuel Middendorf, Steve Lianoglou, Anshul Kundaje, Yoav Freund, Chris Wiggins, Christina Leslie
* '''Platforms Tested''': Windows, Linux, Mac OS X
* '''License''': Open source
* '''Keywords''':
* '''URL''': http://www.cs.columbia.edu/compbio/medusa (MATLAB),http://compbio.sytes.net:8090/medusa (Java beta version)
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===String kernel package===
* '''Description''':The string kernel package contains implementations for the mismatch and profile string kernels for use with support vector machine (SVM) classifiers for protein sequence classification. Both kernels compute similarity between protein sequences based on common occurrences of k-length subsequences ("k-mers") counted with substitutions. Kernel functions for protein sequence data enable the training of SVMs for a range of prediction problems, in particular protein structural class prediction and remote homology detection. A version of the Spider MATLAB machine learning package is also bundled with the code, which allows users to train SVMs and evaluate performance on test sets with the packaged software.
* '''Data Input''': The mismatch kernel requires sequence data in FASTA format. The profile string kernel uses probabilistic profiles, such as those produced by PSI-BLAST, in place of the original sequences. The Spider SVM implementation requires both the kernel matrix and a label file of binary or multi-class labels for the training data; this data must be loaded into MATLAB variables before using Spider routing.
* '''Data Output''':The kernel code produces a kernel matrix for the input data in tab-delimited text format. The Spider package trains SVMs and stores the learns classifier and results from applying the classifier on test data as MATLAB objects.
* '''Implementation Language''': String kernel code is implemented in C. Spider is a set of object-oriented MATLAB routines.
* '''Version, Date, Stage''': Version 1.2, September 2004, stable public release
* '''Authors''': Eleazar Eskin, Rui Kuang, Eugene Ie, Ke Wang, Jason Weston, Bill Noble, Christina Leslie
* '''Platforms Tested''': Windows, Linux
* '''License''': Open source
* '''Keywords''':
* '''URL''': http://www.cs.columbia.edu/compbio/string-kernels
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===MatrixREDUCE===
* '''Description''': Regulation of gene expression by a transcription factor requires physical interaction between the factor and the DNA, which can be described by astatistical mechanical model. Based on this model, the MatrixREDUCE algorithm uses genome-wide occupancy data for a transcription factor (e.g.ChIP-chip or mRNA expression data) and associated nucleotide sequences to discover the sequence-specific binding affinity of the transcription factor. The sequence specificity of the transcription factor's DNA-binding domain is modeled using a position-specific affinity matrix (PSAM), representing the change in the binding affinity (Kd) whenever a specific position within a reference binding sequence is mutated. The PSAM can be transformed into affinity logo for visualization using the utility program AffinityLogo, and a MatrixREDUCE run can be summarized in an easy-to-navigate webpage using HTMLSummary.
* '''Data Input''': sequence file in FASTA format; and expression data file in tab-delimited text format.
* '''Data Output''': PSAMs in numeric and graphical format, parameters of the fitted model, and an HTML summary page.
* '''Implementation Language''': ANSI C, making use of Numerical Recipes routines.
* '''Version, Date, Stage''': Version 1.0, July 10, 2006, extensively tested in lab.
* '''Authors''': Barrett Foat, Xiang-Jun Lu, Harmen J. Bussemaker
* '''Platforms Tested''': Linux, Cygwin (Windows), Mac OS X
* '''License''':
* '''Keywords''': position-specific affinity matrix, binding affinity, cis-regulatory element, expression data, ChIP-chip, transcription factor
* '''URL''': http://www.bussemakerlab.org/software/MatrixREDUCE
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===T-profiler===
* '''Description''': T-profiler is a web-based tool that uses the t-test to score changes in the average activity of pre-defined groups of genes. The gene groups are defined based on Gene Ontology categorization, ChIP-chip experiments, upstream matches to a consensus transcription factor binding motif, and location on the same chromosome, respectively. If desired, an iterative procedure can be used to select a single, optimal representative from sets of overlapping gene groups. A jack-knife procedure is used to make calculations more robust against outliers. T-profiler makes it possible to interpret microarray data in a way that is both intuitive and statistically rigorous, without the need to combine experiments or choose parameters.
* '''Data Input''': Currently, gene expression data from Saccharomyces cerevisiae and Candida albicans are supported.
* '''Data Output''':
* '''Implementation Language''': T-profiler is written in PHP, data is managed by a MYSQL database server
* '''Version, Date, Stage''':
* '''Authors''': André Boorsma, Barrett C. Foat, Daniel Vis, Frans Klis, Harmen J. Bussemaker
* '''Platforms Tested''': Web-based application
* '''License''':
* '''Keywords''': gene expression, transcriptome, ChIP-chip, Gene Ontology
* '''URL''': http://www.t-profiler.org
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===TranscriptionDetector===
* '''Description''':A tool for finding probes measuring significantly expressed loci in a genomic array experiment. Given expression data from some tiling array experiment, TranscriptionDetector decides the likelihood that a probe is detecting transcription from the locus in which it resides. Probabilities are assigned by making use of a background signal intensity distribution from a set of negative control probes. This tool is useful for the functional annotation of genomes as it allows for the discovery of novel transcriptional units independently of any genomic annotation.
* '''Data Input''': Expression data (GEO or other platforms) and designation of which probes represent negative controls and which are data probes.
* '''Data Output''': A text file with a list of probes corresponding to significantly expressed loci.
* '''Implementation Language''': ANSI C, making use of GSL.
* '''Version, Date, Stage''':
* '''Authors''': Xiang-Jun Lu, Gabor Halasz, Marinus F. van Batenburg
* '''Platforms Tested''': Linux, Cygwin (Windows), Mac OS X
* '''License''':
* '''Keywords''': tiling arrays, expression, transcriptome
* '''URL''': http://www.bussemakerlab.org/software/TranscriptionDetector/
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

===PhenoGO===
* '''Description''': PhenoGO adds phenotypic contextual information to existing associations between gene products and Gene Ontology (GO) terms as specified in GO Annotations (GOA). PhenoGO utilizes an existing Natural Language Processing (NLP) system, called BioMedLEE, an existing knowledge-based phenotype organizer system (PhenOS) in conjunction with MeSH indexing and established biomedical ontologies. The system also encodes the context to identifiers that are associated in different biomedical ontologies, including the UMLS, Cell Ontology, Mouse Anatomy, NCBI taxonomy, GO, and Mammalian Phenotype Ontology. In addition, PhenoGO was evaluated for coding of anatomical and cellular information and assigning the coded phenotypes to the correct GOA; results obtained show that PhenoGO has a precision of 91% and recall of 92%, demonstrating that the PhenoGO NLP system can accurately encode a large number of anatomical and cellular ontologies to GO annotations. The PhenoGO Database may be accessed at www.phenogo.org.
* '''Data Input''': Gene Ontology Annotations Files and Medline Abstracts
* '''Data Output''': XML file and www.phenogo.org Web Portal
* '''Implementation Language''': A variety of modules, the web portal is in Java and MySQL, the computational terminology component (phenOS) is written in Perl scripts that queries tables in IBM DB2, the natural language processing component is written in PROLOG.
* '''Version, Date, Stage''': Version 2, Feb 2006
* '''Authors''': Yves Lussier and Carol Friedman are the principal investigators. The programmers are Jianrong Li, Lee Sam, and Tara Borlawsky
* '''Platforms Tested''': n/a
* '''License''': n/a
* '''Keywords''': Phenotypic integration, computational phenotypes
* '''URL''': www.phenogo.org
* '''Organization''': MAGNet
* '''NCBC Ontology Classification''':

SDIWG:NCBC Software Classification MAGNet Examples

2007-05-10T14:47:44Z

Aris:

===DelPhi===
* Description: DelPhi provides numerical solutions to the Poisson-Boltzmann equation (both linear and nonlinear form) for molecules of arbitrary shape and charge distribution. The current version is fast (the best relaxation parameter is estimated at run time), accurate (calculation of the electrostatic free energy is less dependent on the resolution of the lattice) and can handle extremely high lattice dimensions. It also includes flexible features for assigning different dielectric constants to different regions of space and treating systems containing mixed salt solutions.
* Input (parameters & Data Types): DelPhi takes as input a coordinate file format of a molecule or equivalent data for geometrical objects and/or charge distributions
* Output Data (parameters & Data Types): electrostatic potential in and around the system
* Implementation Language: Fortran and C
* Version, Date, Stage: Stable public release
* Authors: E.Alexov, R.Fine, M.K.Gilson, A.Nicholls, W.Rocchia, K.Sharp, and B. Honig.
* Platforms Tested: Unix-SGI IRIX, linux, PC (requires Fortran and C compilers), AIX IBM version and Mac.
* License: Freely available to academia; pay model for commercial users.
* Keywords: Finite Difference Poisson-Boltzman Solver
* URL: http://trantor.bioc.columbia.edu/delphi

SDIWG:Meeting Minutes 20070216

2007-02-19T20:48:19Z

Aris: /* Action Items */

'''Agenda: NCBC Joint Working Group Meeting'''

[[SDIWG:Software_and_Data_Integration_Working_Group|Top page of SDIWG web site]]

===Notes===
''Friday, February 16, 2007: 2:30 -- 3:30 PM EST'' 
''Please contact Peter Lyster for information lysterp@mail.nih.gov'' 
''[http://na-mic.org/Wiki/index.php/SDIWG:Software_Engineering_Body_of_Knowledge_Across_NCBC_Biocomputing_Centers SDIWG Meetings Page]'' 

=== TCon Agenda ===

* Roll Call/Note-taker (Floratos) (5 min)

* Minutes from most recent SDIWG tcon [[SDIWG:Meeting_Minutes_20061117|Nov. 17, 2006]] (5 min)

* Progress in three working groups
** '''[[SDIWG:_NCBC_Scientific_Ontologies|Scientific Ontologies]]''' (Zak Kohane, Mark Musen, and Suzi Lewis, 15 min)
*** TBD
** '''[[SDIWG:_NCBC_Resource_Yellow_Pages_and_Software_Ontologies|Yellow Pages and Resourceome]]''' (Ivo Dinov, Daniel Rubin, Bill Lorensen, Jonathan Dugan, 15 min)
*** Past Progress
**** [http://www.loni.ucla.edu/twiki/bin/view/CCB/NCBC_ToolIntegration_XMLtology_2006_OtherArchives Other Attempts to provide archives of SW and tool resources]
**** Proposed set of '''[http://www.loni.ucla.edu/twiki/bin/view/CCB/NCBC_ToolIntegration_XMLtology_Meeting_071806_RequiredFields Required/Minimal]''' Resource Description Fields
**** [http://www.loni.ucla.edu/twiki/bin/view/CCB/NCBC_ToolIntegration_XMLtology_Meeting_071806_ToolSpecificFields '''Optional''' and '''Resource-specific''' description fields]
**** [http://bioontology.org/projects/ontologies/SoftwareOntology/ Current Ontological Description of the NCBC Tool Yellow-Pages]
**** [http://www.loni.ucla.edu/twiki/bin/view/CCB/NCBC_ToolIntegration_XMLtology_Meeting_103106_ResourceDef Types of CompBio Resources]?
**** [[SDIWG:NCBC_Software_Classification | Center described 3-5 resources]]?
*** Status of NCBC '''''[http://www.loni.ucla.edu/twiki/bin/view/CCB/NCBC_ToolIntegration_iTools iTools]'''''
**** V1 (v 0.1, 10/12/06)
**** V4 (v 0.4, 02/13/07)
*** Sugested Action Items for the YPR WG:
**** Each Center should ensure that they have [[SDIWG:NCBC_Software_Classification|6-12 SW Resources described according to these templates]]
**** Test, comment and revise iTools. How does it compare with other similar efforts?
** '''[[SDIWG:_NCBC_Systems_Biology|Applications of Systems Biology, Modeling, and Analysis]]''' (Aris Floratos, Brian Athey, Russ Altman, 15 min)
*** TBD

=== Minutes ===

Note taker: (Aris Floratos)

* Review of 11/17/06 minutes
** '''[[SDIWG:_NCBC_Resource_Yellow_Pages_and_Software_Ontologies|Yellow Pages and Resourceome]]'''
** '''[[SDIWG:_NCBC_Scientific_Ontologies|Scientific Ontologies]]'''
** '''[[SDIWG:_NCBC_Systems_Biology|Applications of Systems Biology, Modeling, and Analysis]]'''

Zak Kohane outlined the process via which the Scientific Ontologies SIG reviewed a select set of ontologies and vocabularies. A summary list along with the a broad assessment of overall "quality" has been compiled by Suzie Lewis and is available at www.berkeleybop.org/sowg/table.cgi. In addition to being an inventory for the vocabularies and ontologies that tbe NCBCs use intenally, the SIG hopes that this list will be a useful starting point to non-experts seeking to identify ontologies/terminologies appropriate for use in their work. Zak recognized that there is a gap in the set of vocabularies reviewed, namely in the area of imaging and neuroscience and he solicited help in identifying the main players in these domains. Daniel Rubin offered to help in this direction.

In response to questions/suggestions from Gil Ommenn and Brian Athey, it was agreed that the Scientific Ontologies SIG will also produce representative use cases, outlining how ontologies/terminologies are used in practice.

Ivo Dinov reviewed the process that led to the definition of the minimum resource descriptors and Dan Rubin's software classification ontology and proceeded to discuss the iTools framework (www.loni.ucla.edu/twiki/bin/view/CCB/NCBC_ToolIntegration_iTools) and demonstrate the Java prototype front end. The demo outlined several supported search modes as well as the ability of the tool to "scrape" updates to tool description information from Web pages that follow the format of the Resource descriptions on the SDIWG wiki.

Suzie Lewis observed that the scraping approach useb by the iTools could be used to also process the listing of terminologies produced by the Ontologies SIG and allow then to be presented/searched via the iTools application.

A discussion followed about what is the proper manner to release and start using the iTools framework. Suggestions were made taht NIH should centrally host the database and portal, so as to offer the approrpiate branding and central entry point for the general public to access the NCBC resources. One idea was to develop a NCBC-specific web site (e.g., ncbc.org; or nih.ncbc.org) for that purpose. Peter Lyster said that there have been internal discussion in NIH about this topic and some ideas (such as having a central entry point to the NCBC resource listing from NCBI or some other institute) have been vetted.

Brian Athey and Aris Floratos commented on the upcoming March telecon of the Systems Biology SIG and the idea of developing a suite of software tools comprising resources from mulitple NCBCs which will be leveraged in the context of various DBPs. The question came up of how we achieve a good balance between tool development and tool adoption and how we advertise and promote the various tools, in particular since the adoption of these tools by the community will, to a large extent, determine the impact and usefulness of the NCBC program as whole and its potential for renewed funding. Peter Lyster commented that there is no concerted effort at the time wihtin NIH to promote the NCBC tools, rather there are a number of ongoing initiatives like caBIG, BIRN, etc that are procedding in parallel. One course of action might be to start with having the iTools hosted at CCB and then move gradually towards bringing the repository within NIH as an updated NCBC page on the BISTI site (or elsewhere) is created. Peter also said that he will ask John Whitmarsh to get back to the PIs regarding the current plans for the program evaluation.

=== Attendees ===

* Sherman, Lyster, Dinov (CCB/UCLA), Lorensen, Michael Montegut (NCBO project mgr), Weymouth, Floratos, Jennie Larkin (NHLBI), Zak & Susanne, Jenkins (NLM), Vivien Bonai, Athey, Kennedy, Mike Mendis, Karen Skinner/NIDA, Musen, Chris Mungall (NCBO), German Cavelier (NIMH), Suzanna Lewis
* Note-taker: [Suggested order of note takers for future meetings: Floratos (this one); Rubin; Jags; Dinov; Chueh; Sherman; Schroeder]

=== Action Items ===
* Dan Rubin to point our for the Scientific Ontologies SIG the key imaging/neuroscience ontologies.
* Ontologies SIG to produce use cases describing practical uses of ontologies/vocabularies.
* Peter Lyster (or/and others?) to bring up as an agenda item to the NCBC project team meeting the issue of hosting the Yellow Pages database centrally within NIH.
* Peter Lyster to ask John Whitmarsh to communicate to the PIs the status of plans for the NCBC evaluation program.

SDIWG:Meeting Minutes 20070216

2007-02-19T20:47:47Z

Aris: /* Action Items */

'''Agenda: NCBC Joint Working Group Meeting'''

[[SDIWG:Software_and_Data_Integration_Working_Group|Top page of SDIWG web site]]

===Notes===
''Friday, February 16, 2007: 2:30 -- 3:30 PM EST'' 
''Please contact Peter Lyster for information lysterp@mail.nih.gov'' 
''[http://na-mic.org/Wiki/index.php/SDIWG:Software_Engineering_Body_of_Knowledge_Across_NCBC_Biocomputing_Centers SDIWG Meetings Page]'' 

=== TCon Agenda ===

* Roll Call/Note-taker (Floratos) (5 min)

* Minutes from most recent SDIWG tcon [[SDIWG:Meeting_Minutes_20061117|Nov. 17, 2006]] (5 min)

* Progress in three working groups
** '''[[SDIWG:_NCBC_Scientific_Ontologies|Scientific Ontologies]]''' (Zak Kohane, Mark Musen, and Suzi Lewis, 15 min)
*** TBD
** '''[[SDIWG:_NCBC_Resource_Yellow_Pages_and_Software_Ontologies|Yellow Pages and Resourceome]]''' (Ivo Dinov, Daniel Rubin, Bill Lorensen, Jonathan Dugan, 15 min)
*** Past Progress
**** [http://www.loni.ucla.edu/twiki/bin/view/CCB/NCBC_ToolIntegration_XMLtology_2006_OtherArchives Other Attempts to provide archives of SW and tool resources]
**** Proposed set of '''[http://www.loni.ucla.edu/twiki/bin/view/CCB/NCBC_ToolIntegration_XMLtology_Meeting_071806_RequiredFields Required/Minimal]''' Resource Description Fields
**** [http://www.loni.ucla.edu/twiki/bin/view/CCB/NCBC_ToolIntegration_XMLtology_Meeting_071806_ToolSpecificFields '''Optional''' and '''Resource-specific''' description fields]
**** [http://bioontology.org/projects/ontologies/SoftwareOntology/ Current Ontological Description of the NCBC Tool Yellow-Pages]
**** [http://www.loni.ucla.edu/twiki/bin/view/CCB/NCBC_ToolIntegration_XMLtology_Meeting_103106_ResourceDef Types of CompBio Resources]?
**** [[SDIWG:NCBC_Software_Classification | Center described 3-5 resources]]?
*** Status of NCBC '''''[http://www.loni.ucla.edu/twiki/bin/view/CCB/NCBC_ToolIntegration_iTools iTools]'''''
**** V1 (v 0.1, 10/12/06)
**** V4 (v 0.4, 02/13/07)
*** Sugested Action Items for the YPR WG:
**** Each Center should ensure that they have [[SDIWG:NCBC_Software_Classification|6-12 SW Resources described according to these templates]]
**** Test, comment and revise iTools. How does it compare with other similar efforts?
** '''[[SDIWG:_NCBC_Systems_Biology|Applications of Systems Biology, Modeling, and Analysis]]''' (Aris Floratos, Brian Athey, Russ Altman, 15 min)
*** TBD

=== Minutes ===

Note taker: (Aris Floratos)

* Review of 11/17/06 minutes
** '''[[SDIWG:_NCBC_Resource_Yellow_Pages_and_Software_Ontologies|Yellow Pages and Resourceome]]'''
** '''[[SDIWG:_NCBC_Scientific_Ontologies|Scientific Ontologies]]'''
** '''[[SDIWG:_NCBC_Systems_Biology|Applications of Systems Biology, Modeling, and Analysis]]'''

Zak Kohane outlined the process via which the Scientific Ontologies SIG reviewed a select set of ontologies and vocabularies. A summary list along with the a broad assessment of overall "quality" has been compiled by Suzie Lewis and is available at www.berkeleybop.org/sowg/table.cgi. In addition to being an inventory for the vocabularies and ontologies that tbe NCBCs use intenally, the SIG hopes that this list will be a useful starting point to non-experts seeking to identify ontologies/terminologies appropriate for use in their work. Zak recognized that there is a gap in the set of vocabularies reviewed, namely in the area of imaging and neuroscience and he solicited help in identifying the main players in these domains. Daniel Rubin offered to help in this direction.

In response to questions/suggestions from Gil Ommenn and Brian Athey, it was agreed that the Scientific Ontologies SIG will also produce representative use cases, outlining how ontologies/terminologies are used in practice.

Ivo Dinov reviewed the process that led to the definition of the minimum resource descriptors and Dan Rubin's software classification ontology and proceeded to discuss the iTools framework (www.loni.ucla.edu/twiki/bin/view/CCB/NCBC_ToolIntegration_iTools) and demonstrate the Java prototype front end. The demo outlined several supported search modes as well as the ability of the tool to "scrape" updates to tool description information from Web pages that follow the format of the Resource descriptions on the SDIWG wiki.

Suzie Lewis observed that the scraping approach useb by the iTools could be used to also process the listing of terminologies produced by the Ontologies SIG and allow then to be presented/searched via the iTools application.

A discussion followed about what is the proper manner to release and start using the iTools framework. Suggestions were made taht NIH should centrally host the database and portal, so as to offer the approrpiate branding and central entry point for the general public to access the NCBC resources. One idea was to develop a NCBC-specific web site (e.g., ncbc.org; or nih.ncbc.org) for that purpose. Peter Lyster said that there have been internal discussion in NIH about this topic and some ideas (such as having a central entry point to the NCBC resource listing from NCBI or some other institute) have been vetted.

Brian Athey and Aris Floratos commented on the upcoming March telecon of the Systems Biology SIG and the idea of developing a suite of software tools comprising resources from mulitple NCBCs which will be leveraged in the context of various DBPs. The question came up of how we achieve a good balance between tool development and tool adoption and how we advertise and promote the various tools, in particular since the adoption of these tools by the community will, to a large extent, determine the impact and usefulness of the NCBC program as whole and its potential for renewed funding. Peter Lyster commented that there is no concerted effort at the time wihtin NIH to promote the NCBC tools, rather there are a number of ongoing initiatives like caBIG, BIRN, etc that are procedding in parallel. One course of action might be to start with having the iTools hosted at CCB and then move gradually towards bringing the repository within NIH as an updated NCBC page on the BISTI site (or elsewhere) is created. Peter also said that he will ask John Whitmarsh to get back to the PIs regarding the current plans for the program evaluation.

=== Attendees ===

* Sherman, Lyster, Dinov (CCB/UCLA), Lorensen, Michael Montegut (NCBO project mgr), Weymouth, Floratos, Jennie Larkin (NHLBI), Zak & Susanne, Jenkins (NLM), Vivien Bonai, Athey, Kennedy, Mike Mendis, Karen Skinner/NIDA, Musen, Chris Mungall (NCBO), German Cavelier (NIMH), Suzanna Lewis
* Note-taker: [Suggested order of note takers for future meetings: Floratos (this one); Rubin; Jags; Dinov; Chueh; Sherman; Schroeder]

=== Action Items ===
* Dan Rubin to point our for the Ontology SIG the key imaging/neuroscience ontologies.
* Ontology SIG to produce use cases describing practical uses of ontologies/vocabularies.
* Peter Lyster (or/and others?) to bring up as an agenda item to the NCBC project team meeting the issue of hosting the Yellow Pages database centrally within NIH.
* Peter Lyster to ask John Whitmarsh to communicate to the PIs the status of plans for the NCBC evaluation program.

SDIWG:Meeting Minutes 20070216

2007-02-19T20:47:04Z

Aris: /* Minutes */