## Tuesday, October 30, 2007

### Profiling Ruby/C applications

As mentioned in a previous post, the current function for reading plain text datafiles (Dvector::fancy_read) is slow. Really slow. So I decided to switch to a C implementation keeping the same functionalities.

I quickly did a rough translation of the function into C, using basically the same mechanics (and in particular using Ruby regular expressions for parsing), and I was surprised when I found out that I was only winning a factor of around three in the speed of reading. I was even more surprised to see that the reading is O(n2) (reading 100000 lines is around 100 times slower than reading 100 times 1000 lines !). So, I decided I should try my luck with a profiler.

My first step was naively to compile the library with the -pg gcc option, but that didn't produce any output file (although it might have been due to the fact that I forgot to add the switch again for linking). I attributed that to the fact that the whole program should be compiled with the switch, and not only the shared library. So I did write a small C wrapper, compiled it, and ran it. It did produce a gmon.out file, but gprof was unable to give me any interesting information from that. I guess I needed a finer granularity that my own functions, and for that I should have compile Ruby with profiling support. Well. Drop it.

So, I was about to give up when I thought about valgrind. Valgrind also comes with a profiler tool, callgrind. So, I ended up doing the following:

~ valgrind --tool=callgrind ./fancy_read
~ callgrind_annotate callgrind.out.24425| less


The first command runs the program with valgrind, saving data into a file called something like callgrind.out.24425. The second parses this file and displays the number of intructions spent in each of the most significative functions. Here is a extract of the output:

306,777,466  ???:__printf_fp [/lib/libc-2.6.1.so]
226,575,122  ???:0x0000000000041C20 [/lib/libc-2.6.1.so]
121,900,026  ???:0x00000000000805F0 [/usr/lib/libruby1.8.so.1.8.6]
98,208,889  ???:__strtod_internal [/lib/libc-2.6.1.so]
90,449,474  ???:0x00000000000482D0 [/lib/libc-2.6.1.so]
87,300,200  ???:vfprintf [/lib/libc-2.6.1.so]
84,197,045  ???:ruby_re_search [/usr/lib/libruby1.8.so.1.8.6]
71,866,550  ???:0x0000000000071D60 [/lib/libc-2.6.1.so]
65,387,136  ???:0x0000000000071380 [/lib/libc-2.6.1.so]
62,784,967  ???:0x000000000004A9F0'2 [/usr/lib/libruby1.8.so.1.8.6]
61,279,027  ???:0x000000000004AC80 [/usr/lib/libruby1.8.so.1.8.6]
48,357,879  ???:malloc [/lib/libc-2.6.1.so]
41,915,985  ???:free [/lib/libc-2.6.1.so]
34,200,000  ???:rb_reg_search [/usr/lib/libruby1.8.so.1.8.6]
31,432,232  ???:ruby_xmalloc [/usr/lib/libruby1.8.so.1.8.6]


This shows that most of the time is spent displaying the data. Normal, half of my program does only that. Then, a fair amount of time is spent in strtod. Nothing to improve there. Another fair amount is spent in regular expression matching, and then, a significant part of the processing time is actually spent on memory management ! Dreadful ! I guess there's not much more I could do

The conclusion to this is that if you need to profile something, use valgrind. This is much more powerful than gprof !

## Sunday, October 28, 2007

### SciYAG goes on...

Well, work is going on on SciYAG. I added an view to select on the experiment type, and I tried for the first time a little scale up: I brutally imported my ca. 2000 data files into the program. It takes a few seconds to import, but then, it is really fast. Some remarks:

• I'm making very heavy use of the QStandardItemModel Qt class, and it scales up really easily, even with QtRuby. If you plan to make a model/view application in QtRuby, do not subclass QAbstractItemModel! The underlying model/view architecture make many calls to functions of the model, and calls from C++ to Ruby in QtRuby are prohibitive (compared to C++/C++ calls). Rather, fill in a QStandardItemModel, it is likely to work much faster.
• I'm glad it scales up nicely: the navigation between my 2000 data files is very smooth and neat.
• Currently, the function for reading text datafiles is written in pure Ruby, and it is apparently much too slow. I had planned a long time ago to rewrite that in C, and it looks like time has come to do so

## Thursday, October 25, 2007

### wegben and inclusion of manual pages

Funnily enough, I haven't spoken about webgen yet. webgen is a very powerful static website generation tool written in Ruby. It was written with a very good plugin architecture in mind which makes it trivial to extend it.

Most of the sites I maintain (namely, SciYAG's, Tioga's and my former web page, the latter not being maintained anymore) are written using webgen. I recommend you have a look at those sites and at the website of webgen itself, you'd be amazed of the possibilities it offers.

SciYAG's website is making a heavy use of webgen's extension possibilities. I did write plugins for things as various as

• including a thumbnail of an image and a link to it with a code as simple as:
{linkImage: Electrode_cleaning.png}
• converting automatically ctioga's shell script files into PDF, and converting any PDF into a PNG image with a thumbnail
• having a random image at every page (this really looks great in my humble opinion)
• use SVN information to display the name of the last author
• and many more things

My last addition to this wealth of plugins is a way to include a manual page within a webgen page, so that you profit from the CSS and from the navigation bar. You can get an idea of how it looks there. The source code is available as usual in the SVN repository.

## Wednesday, October 24, 2007

### KDE, keyboards and hal

This morning, when I tried to log into KDE (using kdm), I miserably failed several times... What was worse, the Ctrl+Alt+F... keys didn't want to work anymore, so I couldn't login to the console... It appeared that it is linked to a newer version of hal (see bug #442316). Downgrading hal and hal-info did the trick.

## Tuesday, October 23, 2007

### SciYAG is going on...

Well, even if I didn't post here, I've kept myself decently busy. The commit that just went into SciYAG's SVN repository sees a new step in the completion of a public release: it is now possible to edit experiment types. Here's a picture of the current dialog box for editing experiment types. It is admittedly not very comfortable yet, but it works. I'll take time to improve it later when that becomes a priority.

Of course, I'll keep you posted about my progresses !

## Thursday, October 18, 2007

### Wodim, genisoimages and pipes

I'm just burning some saves, and I thought that while waiting for the burning process to finish, I could share some of my .zshrc tricks about this... I'm using wodim and genisoimage (Debian tools derived from cdrecord and mkisofs) to burn my disks, and here is what I have in my .zshrc to burn using a pipe:

wodim-pipe() {
size=genisoimage -q -print-size $@ genisoimage$@ | wodim -tao fs=100M \
speed=2 dev=/dev/cdrw1 driveropts=burnfree -v \
tsize=$size's' -multi - }  With this function, I'm simply using the following code to burn the contents of the saves-server directory, to the saves-18-10-2007 directory on the disk: wodim-pipe -r -J -root saves-18-10-2007 -joliet-long saves-server  Hope you'll find this useful !! ## Wednesday, October 17, 2007 ### A first draft of SciYAG... After some time, I finally have the first working draft of the upcoming SciYAG ! Admittedly, it currently doesn't do much more than just displaying the entries, but that already is a start, because you can browse pretty fast... Here are the main things I'll have to focus on over the next days or weels: • Provide a decent interface to change experiment types and then, most importantly, experience values (that would be under the current window) • Provide a way to group experiences so they share the same data when that makes sense (when they really share some experimental conditions - when the exact same buffer or temperature or... were used) • For the first version of SciYAG a year ago, I used QPainter::drawPath for painting my curves, which was fine at that time, but resulted in much hassle and too much Ruby code (much slower than C++). So I need to switch to the new 2D framework, QGraphicsScene/QGraphicsView - which wasn't available when I started back then • Put back all the navigation features (zoom and the like) that were part of the old SciYAG at one point in the past As you can see, I have things to keep myself busy... I'll keep you posted here ! ## Sunday, October 14, 2007 ### Hoorray ! The commit that just got into SciYAG's SVN repository is the first one that actually gets a user interface running ! There is not much, admittedly (just dummy entries for experiments you can't change yet), but, still, it shows that I've made decent progresses in the architecture. I'll keep you posted about future developments ! ### What I'm waiting for in the NEW queue The NEW queue is where new Debian packages (packages that install packages which were not before in the Debian archive) are waiting for a short review before being let in. This review is mainly about legal problems, though you can find here a list of the reasons for which a package could be rejected. It so happens that there are currently some packages which I'm interested in, and too lazy to fetch and build myself: • Of course, the first one is freecol, which I packaged myself and can't wait for it to become part of the official Debian distribution. (and I guess that will please the original developers as well !). • Still about games, I'm interested in the lordsawar package, which seems to be a rather neat strategic game as well. • Finally, I've stumbled upon homebank, which seems to be a really neat personal accounting software. I'm currently using grisbi, which is nice, but doesn't satisfy me somehow. I'll post more about this if I ever come to switch to HomeBank As a last note, please consider that the links in the list above will eventually (and rather soon) be broken, when the packages have gone out of the NEW queue. They are only here for me to check more easily the status of the aforementioned packages... So, don't go looking for them, and don't complain ! ## Friday, October 12, 2007 ### The aurical fonts for LaTeX I've just been writing some cards for the birth of my daughter, and I've been looking for nice-looking fonts for LaTeX. I'm using the texlive distribution (Debian packages). So I've installed the texlive-fonts-extra package, and I had a quick look at all the pdf files therein: for f in dpkg -L texlive-fonts-extra | grep pdf; do \ [ -h$f ] || xpdf $f; done  The [ -h$f ] || blurb is here to prevent reading the same file twice via a symlink. I found many interesting things, but not that many "fancy" fonts - apart from one package, aurical, that provides neat handwriting-like fonts. Attached is what I obtained with the \Fontlukas command. Hope you'll find this useful !

I pretty much enjoy using Blogger for this blog. I find it rather comfortable and neat. There is however one point I don't like much, the inability to share files easily. There are many (small) files I'd like to share, such as various configuration files, codes, patches, and the like. After a quick look, it seems that Mediafire is providing pretty nice free service, with a sleek interface. So watch for file downloads there !

### Change of plans for SciYAG

Initially, the SciYAG program in the SciYAG project was intended as a direct competitor for the interactive part of gnuplot, while ctioga is definitely a good competitor of the "output" part (get convinced there). I had some part working, with a command-line working exactly as ctioga and some reasonable features.

However, with time, I found that what I'm really missing in my everyday scientific work is not an interactive gnuplot clone. As far as I can tell, gnuplot is jolly good itself for that, and we have a neat home-brewed program called Soas that does the job pretty well, and is tailored to our needs.

So, then, what ? What I really miss is a good program to organise my data. Here is the features I'm missing:

• First, I want to be able to attach meta-data to my datafiles, such as the buffer used for the experiment, the pH values, and so on. All these should be fully customizable on a per-experiment or per-experiment-type basis
• I want to be able to browse quickly to my experiments, sorting them according to their type, their values of their meta-data
• Of course, I'm interested to actually display the data, possibly massively, possibly as a function of some of the meta-data
• I also want to be able to associate series of fits of different kinds to each of the datafiles, and be able to quickly check them (visually, by displaying the fit and the data), and plot the parameters found there as a function of meta-data.
• And more !
This may sound ambitious, but actually, with a great programming language like Ruby, it is fun. A great deal of the architecture is already in place. What I'm currently working on is a way to store all the data. The trick is to make it so it can evolve easily, as I'll be using this program extensively for my own needs, and I don't want to reenter any meta-data already in. A good architecture should also please the lazy in me (which amounts to, say, 95%...) and have me type in only what is absolutely necessary - so most values for meta-data should be shared or guessed.

There is currently no release of SciYAG, though you can chance a look at it's SVN repository. Be sure I'll keep you posted about my progresses on the field !

## Wednesday, October 10, 2007

### PDF and included images

A little earlier in the evening, I had to send a document to my parents including pictures. Though the final version definitely should have high-quality pictures, I was forced to see that the version I sent was maybe slightly too big (5MB by email, when the receiver doesn't have a very fast connection, that is painful). After some experimentations with pdftops (from xpdf) and ps2pdf (from ghostscript) that didn't give satisfying results, I tried to use ghostscript directly:
~ gs -sOutputFile=biniou.pdf -sDEVICE=pdfwrite \
-dCompatibilityLevel=1.4 FairePart.pdf  < /dev/null

What a surprise ! The output did shrink by a factor of 4:
~ ls -lh FairePart.pdf biniou.pdf
-rw-r--r-- 1 vincent vincent 1.1M 2007-10-10 22:56 biniou.pdf
-rw-r--r-- 1 vincent vincent 5.7M 2007-10-10 18:14 FairePart.pdf

A quick check with pdfimages combined with identify shows that all the images kept the same resolution. That must be a question of JPEG conversion, or something of this spirit... Then, another try is to add the -dPDFSETTINGS=/screen option to the command-line:
~ gs -sOutputFile=biniou.pdf -sDEVICE=pdfwrite \
-dPDFSETTINGS=/screen -dCompatibilityLevel=1.4
FairePart.pdf  < /dev/null

Then, output file is minuscule:
~ ll biniou.pdf
-rw-r--r-- 1 vincent vincent 34K 2007-10-11 00:09 biniou.pdf

The downside is, the output is pretty much ugly (well, you wouldn't have hoped anything good with a win of a factor of 150). Images went down from 2576x1932 to 322x241 or even smaller (depends on the physical size of the image). -dPDFSETTINGS=/ebook gave a slightly better output (for 70K), but still not good for my case... So I tweaked the pdfwrite parameters by hand:
 gs -sOutputFile=biniou.pdf -sDEVICE=pdfwrite \
-dColorImageDownsampleType=/Bicubic -dColorImageResolution=300 \
-dDownsampleColorImages=true -dCompatibilityLevel=1.4 \
FairePart.pdf  < /dev/null

This gave me a pretty nice result. And this also shows that my image resolution was way too big anyway - 300 dpi is probably the best I'll get when printing... and the file produced is still ridiculously small (172K) ! I now start to realize the power of ghostscript, and I thank its authors for it !