Wednesday, 26 September 2012

Book Scanning

I have an old book which I am mining data from... The problem is paper is a pain; there is no Ctrl+F function and unless the index includes the terms you are after (which, in this case, it doesn't) then searching turns into a real pain.

The solution? Scan it.
The problem? How to scan it.

Unlike printed, typed or even many handwritten documents it's not easy to pull apart a book and scan the pages with an automatic machine, especially when the book is old, out of print and quite valuable. Most book scanners (including Google's) use cameras instead. This is my setup:

A very high-tech setup.

It's all very simple; a camera, a tripod to hold the camera still, remote shutter button to snap the pictures, lots of lamps for even illumination and a data connection to the computer so I didn't fill up the memory card too fast.


It was a pretty chunky book (801 pages) and it took a total of 489 shots (including reshoots of slightly out-of-focus pages) to capture all of it. That took nearly 1.5 hours, or about 10 seconds per photo. So what does a whole book look like?



With some magic semi-automated processing these images are all that is needed for a perfect scan. Using ImageJ I converted them to black and white, subtracted the background and cropped/rotated the pages. These are some samples:



These processed images can simply be fed into Adobe Acrobat or other similar optical character recognition (OCR) software to translate the image of the text into machine-understandable, fully-searchable text. Exactly what I need!

Software used:
ImageJ: Automated image processing

Tuesday, 17 July 2012

The Problem with Figures

One problem with normal scientific writing is the separation of data (in figures) 
technical details (in figure legends and methods sections) and the scientific conclusions (in the 
main text). But don't worry! Edward Tufte has the answer through sparklines.


Software used:
Microsoft Office - Graph and text production
Fonts:
Gentium Plus (main text)
Titillium Text (headings)

Monday, 2 July 2012

QR Time - the least useful clock ever

QR time - click to visit the site.

Probably the least useful clock ever, requiring a barcode reader to read the time!

Saturday, 26 May 2012

3DQR

Emart in Korea just came up with something amazing; a sundial-like sculpture where the shadows make, between 12 and 1, a QR code you can scan to get info about special offers. I had to have a go myself!
This is a 3D rendering of a 3D shape which, when the light is from the right angle, makes a QR code which encodes a link to this blog. You can see a bigger version here.


QR codes are the leading 2D barcode method for encoding information and can be scanned by many phones. A simple grid of black and white squares encodes the data:
This is the QR code that encodes a link to this blog:
To work out the 3D shape that would make shadows which look like the QR code is actually quite simple. By following three rules each square in the QR code can be converted from black/white to a 3D height which will give the right shadowing effect:
  1. If a square in the QR code is white that square should have a height of zero.
  2. If a square in the QR code is black and also has a black square directly above it then it should have a height of zero.
  3. If a square is black and the square directly above it is white then it should have a height greater than zero. Starting from that square work downwards counting the number of black squares before you get to a white square. The number of black squares is the height that square should be, e.g. if a black square has two black squares below it then a white one then the square should have a height of 3.
This can be automated easily; this is the ImageJ macro code which does this calculation:
run("8-bit");
run("Add Slice");
for (x=0; x
for (y=1; y
setSlice(1);
v=getPixel(x, y);
if (v==255) {
w=0;
} else if (v==0) {
if (getPixel(x, y-1)==0) {
w=0;
} else {
y2=y;
while(getPixel(x, y2)==0) {
y2++;
}
w=y2-y;
}
}
setSlice(2);
setPixel(x, y, w);
}
}

This picture shows the heights I calculated for each square in the QR code, black corresponds to a height of zero and each brighter shade of grey corresponds to a height of 1, 2, 3, etc:
I made a 3D model of this in Blender:
It doesn't look like much... but if you look at it from the right angle, with the right direction of lighting, the QR code pops out:
All in all pretty cool!

Software used:
ImageJ: QR code analysis
Blender: 3D modelling and rendering

Monday, 21 May 2012

Light vs. Microscopists

Light vs. Microscopists, my research comics feature in OUBS Phenotype, trinity term 2012. Everything you wanted to know about superresolution light microscopy in one fun package. Including a cute kitten. Check out the full issue here.



Software used
Inkscape: Document design and layout.
ImageJ: Micrograph simulation.

Thursday, 17 May 2012

Diatomaceous Anaglyphs

I was trying to work out what the least friendly, most jargon rich blog title I could possibly write something interesting about... I think "Diatomaceous Anaglyphs" is a pretty good effort!

Diatomaceous means "of diatoms"; a type of single celled organism which grows beautiful shells made of silica. In this case these are the shells of dead diatoms from millions of years ago which settled out of the sea, made beds of diatom shells then got compressed into a soft rock. This rock is mined and processed for use in various areas of industry and is called diatomaceous earth.

Anaglyphs are the name of the red/blue 3D images you get which can be looked at using some glasses with coloured celophane instead of lenses. With the surge in 3D films at the cinema you can find more and more 3D stuff online using anaglyphs.

When I combined the I made diatomaceous anaglyphs, which are just plain beautiful (you can also view the whole album on my Flickr):


For those of you not geeky enough to have red/blue 3D glasses this is an animated version:


Software used:
ImageJ: Image alignment and processing

Sunday, 13 May 2012

FatFonts

FatFonts are a great new concept by Miguel Nacenta, Uta Hinrichs, and Sheelagh Carpendale for presenting numerical data in an easy to understand way, but while leaving the numbers readable.

The idea is that numbers with a large value appear fatter and darker than numbers with a small value which appear thinner and lighter. This means that if you want to plot something like a height map, where each pixel has a value equal to its height, you can just write out the numbers and they build the image for you. The really clever bit is that if you have a multiple digit number, like 8242, the 8 (8 thousand) appears biggest, the first 2 (2 hundred) appears a bit smaller, the 4 (4 tens) appears even smaller and the last 2 (2 units) appears tiny.

It is always easier to explain concepts like this with a picture, so here is some data:


It is hard to look at this and see the shape of the data, but if you use a FatFonts presentation (this font is called Miguta):


You can now actually see the shape of the data, where the bigger numbers are and where the smaller numbers are. This example is of the number 8242 I mentioned above:


The 8 is biggest (8 thousands), the 2 is nested inside it (2 hundreds), then the 4 (4 tens) and finally the last 2 (2 units) is too small to see.

I have coded a couple of fonts (Miguta and 7 Segments) for Miguel Nacenta to make this technique really easy to use. Check out the FatFonts 'how to use' guide for more information.

Update: 17/05/2012
And here is an example of my FatFonts 7 Segments design in action; http://studiolakmoes.nl/datavisualisatie-met-fat-fonts/

Software used:
Inkscape: Managing Miguel's glyph designs
Fontforge: Font preparation and contextual alternates coding