Alan Aspuru-Guzik
(Sent from my mobile phone and might contain typos. Thanks for
understanding.)
Begin forwarded message:
From: Christopher Walker
<cwalker(a)fas.harvard.edu>
Date: September 4, 2009 17:10:38 EDT
To: James Cuff <james_cuff(a)harvard.edu>
Cc: "Suvendra N. Dutta" <suvendra_dutta(a)harvard.edu>du>, rcops
<rcops-list(a)lists.fas.harvard.edu
, Alan Aspuru-Guzik
<aspuru(a)chemistry.harvard.edu>du>, Roel Sanchez <rsanchez(a)fas.harvard.edu
Subject: Re: [Rcops-list] Critical slowdown of Oddyssey.
Hi everyone,
It looks like the problem here was that Q-Chem was using a directory
on /n/Aspuru, a relatively slow filesystem, as scratch space. Roel
changed the scratch directory to /scratch/rsanchez, a faster local
filesystem, and his jobs were much quicker.
A number of the Aspuru-Guzik group members have QCSCRATCH set to a
directory on the /n/Aspuru/ filesystem -- would you like us to
contact these users and let them know that Q-Chem will run much
faster if they use /scratch instead?
Best,
Chris
On Fri, 4 Sep 2009, James Cuff wrote:
>
> Hi all,
>
> We have located the problem - the filesystem /n/Aspuru is being
> hammered, this is causing all the qchem.exe codes to hang in I/O
> wait state. Roel, if you could give Chris Walker a call he can
> work with you to migrate your code to another area. As it is right
> now these jobs will never complete. I'm working with Brian to get
> to the bottom of what is wrong with /n/Aspuru also - we think this
> is the smoking gun right now.
>
> Chris can also help get these critical jobs pushed through for you
> - I know you are under the gun for a deadline. The rest of the
> cluster is fine - mainly this one slow filesystem.
>
> Best,
>
> j.
>
> --
> dr. james cuff, director of research computing & chief technology
> architect
> harvard university - faculty of arts and sciences information
> technology
> rm 135, the science center, one oxford street, cambridge. ma. 02138.
> tel: +1 617 384 7647 | www:
http://rc.fas.harvard.edu
>
> ________________________________________
> From: rcops-list-bounces(a)lists.fas.harvard.edu [rcops-list-
> bounces(a)lists.fas.harvard.edu] On Behalf Of Suvendra Nath Dutta
> [suvendra_dutta(a)harvard.edu]
> Sent: Friday, September 04, 2009 12:36 PM
> To: rcops
> Subject: [Rcops-list] FW: Critical slowdown of Oddyssey.
>
> ------ Forwarded Message
> From: Alan Aspuru-Guzik <aspuru(a)chemistry.harvard.edu>
> Date: Fri, 4 Sep 2009 11:22:39 -0400
> To: Jerry Lotto <lotto(a)chemistry.harvard.edu>du>, Suvendra Nath Dutta
> <suvendra_dutta(a)harvard.edu>
> Subject: Critical slowdown of Oddyssey.
>
> Dear Jerry and Suvendra,
>
> You probably are aware, but right now there is a horrible critical
> slowing
> down of Odyssey. Sim[ple single-point quantum chemistry
> calculations that
> take 20 minutes are taking roughly 10 hours. These are the jobs of
> rsanchez.
> Roel confirms that other kinds of jobs (even the GPGPU queue) seem
> to be in
> the same boat. This started to happen within the last 4 days, and is
> affecting the nnin, lsdiv, etc. queues. Any leads on what are the
> reasons
> for that? IO?
>
> In any case, we are worried as we need to finish a set of
> calculations ASAP
> for a paper, and this is throwing us off schedule.
>
> Thank you,
>
> Alan
>
>
>
>
> Alán Aspuru-Guzik | Assistant Professor
> Harvard University | Department of Chemistry and Chemical Biology
> 12 Oxford Street, Room M113 | Cambridge, MA 02138
> (617)-384-8188 |
http://aspuru.chem.harvard.edu
>
>
>
> ------ End of Forwarded Message
>
> _______________________________________________
> Rcops-list mailing list
> Rcops-list(a)lists.fas.harvard.edu
>
http://lists.fas.harvard.edu/mailman/listinfo/rcops-list
> _______________________________________________
> Rcops-list mailing list
> Rcops-list(a)lists.fas.harvard.edu
>
http://lists.fas.harvard.edu/mailman/listinfo/rcops-list
>