Difference between revisions of "Problems and Solutions on SPL Machine Blog"

From NAMIC Wiki
Jump to: navigation, search
 
(6 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
----
 
----
  
With the install of Fedora 7 we have been having some problems, namely with the OpenGL driver recognition by SCIRun.
+
Jan 08:  There are currently no SPL specific problems with SCIRun on SPL machinesThere is a more general bug in SCIRun related to questionably threadsafe code, specifically DLOpen calls which are primarily in the dynamic compilation portion of SCIRun.  These show up often on the fat nodes. Jeroen is working to eliminate dynamic compilation, and eliminate these bugs which manifest more frequently and randomly with large networks and multicore machines which "stress" the thread safety of codeIf you are using SCIRun and running into these bugs please let him know.
 
 
For example when we utilize the tools on the previous page SCIRun gets pointed at the correct drivers as shown here:
 
 
 
spl_tm64_1:/workspace/mjolley/Modeling/trunk/SCIRun/bin% ldd scirun | grep GL
 
        libGL.so.1 => /usr/lib64/nvidia/libGL.so.1 (0x0000003f8e200000)
 
        libGLU.so.1 => /usr/lib64/libGLU.so.1 (0x000000360c200000)
 
        libGLcore.so.1 => /usr/lib64/nvidia/libGLcore.so.1 (0x0000003f80e00000)
 
 
 
 
 
When these drivers are recognized SCIRun runs as expectedHowever, each time SCIRun is run it reverts back to the wrong drivers as evidenced here.
 
 
 
spl_tm64_1:/workspace/mjolley/Modeling/trunk/SCIRun/bin% ldd scirun | grep GL
 
        libGL.so.1 => /usr/lib64/libGL.so.1 (0x0000003f84800000)
 
        libGLU.so.1 => /usr/lib64/libGLU.so.1 (0x000000360c200000)
 
 
 
After this, any OpenGL dependent modules crash upon openingIf you repeat the steps "unsetenv" and run Dav's script again you get back to:
 
 
 
 
 
spl_tm64_1:/workspace/mjolley/Modeling/trunk/SCIRun/bin% ldd scirun | grep GL
 
        libGL.so.1 => /usr/lib64/nvidia/libGL.so.1 (0x0000003f8e200000)
 
        libGLU.so.1 => /usr/lib64/libGLU.so.1 (0x000000360c200000)
 
        libGLcore.so.1 => /usr/lib64/nvidia/libGLcore.so.1 (0x0000003f80e00000)
 
 
 
To summarize:
 
 
 
 
 
unset LD_LIBRARY_PATH
 
 
 
ldd scirun | grep GL
 
 
 
<correct GL is reported>
 
 
 
run script
 
 
 
ldd scirun | grep GL
 
 
 
<wrong GL is reported>
 
 
 
 
 
 
 
So I put:
 
 
 
unset LD_LIBRARY_PATH into the first line of the script(run in bash) and into my .bashrc and I still have the same behavior where it switches back from the good OpenGL setup to the mesa drivers after initially being pointed to the correct ones.
 
 
 
This still didn't seem to do it so we put in the following line in the script:
 
 
 
create_scirun_script() {
 
    echo "scirun -E ${NETWORK} --logfile ALL.log" >/tmp/script-fe.sh
 
    chmod 0770 /tmp/script-fe.sh
 
}
 
 
 
 
 
to
 
 
 
create_scirun_script() {
 
    echo "unset LD_LIBRARY_PATH" > /tmp/script-fe.sh
 
    echo "scirun -E ${NETWORK} --logfile ALL.log" >> /tmp/script-fe.sh
 
    chmod 0770 /tmp/script-fe.sh
 
}
 
 
 
And the above behavior with it switching back to the MESA drivers appears to have stopped, but the script still always hangs at JoinField within the first three bundles run in the net by the script. I went back and confirmed this set of .bdl files run without errors on my ubuntu machine and in the manual nets on the SPL machine.  So it must be an environmental variable.  It typically hangs in the first, second, or third run in Scripts on the SPL machines always at the same positionI am not sure what is up as there is not a good error message:
 
 
 
[[Image:VNCscreenshot.png]]
 
 
 
 
 
This is the output in the command line script window:
 
 
 
spl_tm64_1:/workspace/mjolley% bash
 
bash-3.2$ export PATH=$PATH:/workspace/mjolley/Modeling/trunk/SCIRun/bin
 
bash-3.2$ ./SCIRun_Scripts/run_all.sh /projects/cardio/Clinical-HClean/ /projects/cardio/Clinical-HClean/SCIRun_Nets/Script2_FEM/Script2-FE-refine-elec-dilate-5-100x100x150-all-cases-permut-matrix.srn Results/Clinical
 
ls: cannot access /projects/cardio/Clinical-HClean//Electrodes_Plus_Torso/2ybdls//Four/*.bdl: No such file or directory
 
cat: /tmp/idx1: No such file or directory
 
cat: /tmp/idx2: No such file or directory
 
cat: /tmp/idx3: No such file or directory
 
cat: /tmp/idx4: No such file or directory
 
ls: cannot access /projects/cardio/Clinical-HClean//Electrodes_Plus_Torso/10ybdls//Four/*.bdl: No such file or directory
 
ldd: ./scirun: No such file or directory
 
Parsed .scirunrc... /home/mjolley/.scirunrc
 
Loading Tcl,Tk,tk, Itcl,Itk,Blt,Widgets
 
loading scirun network file: /projects/cardio/Clinical-HClean/SCIRun_Nets/Script2_FEM/Script2-FE-refine-elec-dilate-5-100x100x150-all-cases-permut-matrix.srn
 
scirun> loading file: SCIRun_Scripts/Permutations/P1-500-0.mat
 
loading file: /projects/cardio/Clinical-HClean//Electrodes_Plus_Torso/10ybdls//One/10y-Left-abd-can+10cm-right-parasternal-T4-top.bdl
 
Compiling: ArrayObjectFieldCreateAlgoT<GenericField<TetVolMesh<TetLinearLgn<Point> > ,ConstantBasis<double >,vector<double > > >
 
Compiling: ArrayObjectFieldCreateAlgoT<GenericField<TetVolMesh<TetLinearLgn<Point> > ,ConstantBasis<double >,vector<double > > >
 
Compiling: ArrayObjectFieldCreateAlgoT<GenericField<TetVolMesh<TetLinearLgn<Point> > ,ConstantBasis<double >,vector<double > > >
 
Compiling: ArrayObjectFieldCreateAlgoT<GenericField<TetVolMesh<TetLinearLgn<Point> > ,ConstantBasis<double >,vector<double > > >
 
Compiling: ArrayObjectFieldCreateAlgoT<GenericField<TetVolMesh<TetLinearLgn<Point> > ,ConstantBasis<double >,vector<double > > >
 
Compiling: ArrayObjectFieldDataScalarAlgoT<GenericField<TetVolMesh<TetLinearLgn<Point> > ,ConstantBasis<double> ,vector<double> > ,TetVolMesh<TetLinearLgn<Point> > ::Cell>
 
Compiling: ArrayObjectFieldDataScalarAlgoT<GenericField<TetVolMesh<TetLinearLgn<Point> > ,ConstantBasis<double> ,vector<double> > ,TetVolMesh<TetLinearLgn<Point> > ::Cell>
 
Compiling: Compiling: ArrayObjectFieldDataScalarAlgoTMergeFieldsAlgoT<GenericField<TetVolMesh<TetLinearLgn<Point> > ,ConstantBasis<double> ,vector<double> > ,TetVolMesh<TetLinearLgn<Point> > ::Cell<>
 
GenericField<TetVolMesh<TetLinearLgn<Point> > ,NoDataBasis<double> ,vector<double> > >
 
Compiling: ArrayObjectFieldLocationElemAlgoT<GenericField<TetVolMesh<TetLinearLgn<Point> > ,ConstantBasis<double> ,vector<double> > >
 
Compiling: ArrayObjectFieldDataScalarAlgoT<GenericField<TetVolMesh<TetLinearLgn<Point> > ,ConstantBasis<double> ,vector<double> > ,TetVolMesh<TetLinearLgn<Point> > ::Cell>
 
Compiling: ArrayObjectFieldLocationElemAlgoT<GenericField<TetVolMesh<TetLinearLgn<Point> > ,ConstantBasis<double> ,vector<double> > >
 
Compiling: ArrayObjectFieldDataScalarAlgoT<GenericField<TetVolMesh<TetLinearLgn<Point> > ,ConstantBasis<double> ,vector<double> > ,TetVolMesh<TetLinearLgn<Point> > ::Cell>
 
Compiling: ArrayObjectFieldLocationElemAlgoT<GenericField<TetVolMesh<TetLinearLgn<Point> > ,ConstantBasis<double> ,vector<double> > >
 
Compiling: ArrayObjectFieldLocationElemAlgoT<GenericField<TetVolMesh<TetLinearLgn<Point> > ,ConstantBasis<double> ,vector<double> > >
 
Compiling: ArrayObjectFieldLocationElemAlgoT<GenericField<TetVolMesh<TetLinearLgn<Point> > ,ConstantBasis<double> ,vector<double> > >
 
Compiling: ArrayObjectFieldElemVolumeAlgoT<GenericField<TetVolMesh<TetLinearLgn<Point> > ,ConstantBasis<double> ,vector<double> > ,TetVolMesh<TetLinearLgn<Point> > ::Cell>
 
Compiling: ArrayObjectFieldElemVolumeAlgoT<GenericField<TetVolMesh<TetLinearLgn<Point> > ,ConstantBasis<double> ,vector<double> > ,TetVolMesh<TetLinearLgn<Point> > ::Cell>
 
Compiling: ArrayObjectFieldElemVolumeAlgoT<GenericField<TetVolMesh<TetLinearLgn<Point> > ,ConstantBasis<double> ,vector<double> > ,TetVolMesh<TetLinearLgn<Point> > ::Cell>
 
Compiling: ArrayObjectFieldElemVolumeAlgoT<GenericField<TetVolMesh<TetLinearLgn<Point> > ,ConstantBasis<double> ,vector<double> > ,TetVolMesh<TetLinearLgn<Point> > ::Cell>
 
Compiling: ArrayObjectFieldElemVolumeAlgoT<GenericField<TetVolMesh<TetLinearLgn<Point> > ,ConstantBasis<double> ,vector<double> > ,TetVolMesh<TetLinearLgn<Point> > ::Cell>
 
Compiling: ALGOArrayEngine_1586526061_LCELID_FS<double>
 
Compiling: ALGOArrayEngine_1586525968_LCELID_FS<double>
 
Compiling: ALGOArrayEngine_1586526092_LCELID_FS<double>
 
Compiling: ALGOArrayEngine_1586525999_LCELID_FS<double>
 
 
 
Then SCIRun hangs...
 
 
 
 
 
 
 
 
 
--[[User:Mjolley|Mjolley]] 11:51, 21 August 2007 (EDT)
 

Latest revision as of 19:14, 28 January 2008

Home < Problems and Solutions on SPL Machine Blog

Current Problems on Debugging for SCIRun on SPL Machines


Jan 08: There are currently no SPL specific problems with SCIRun on SPL machines. There is a more general bug in SCIRun related to questionably threadsafe code, specifically DLOpen calls which are primarily in the dynamic compilation portion of SCIRun. These show up often on the fat nodes. Jeroen is working to eliminate dynamic compilation, and eliminate these bugs which manifest more frequently and randomly with large networks and multicore machines which "stress" the thread safety of code. If you are using SCIRun and running into these bugs please let him know.