Wednesday, August 5, 2009

ACTIVITY 10 Preprocessing Text



The available image is titled, so it has to be rotated first using the mogrify() function in SciLab. Figure 1 shows an example of a cropped image and the resulting image after rotating it by 1.21 degrees. A disadvantage of rotating the image is the resulting image has lower resolution or blurry.
Justify Full

Figure 1. Rotation of the image.

Figure 2 shows the resulting images after applying a technique on the original image or the processed image. The first image is the inverted image of the original image (black becomes white and vice versa). Inversion of the image makes the choosing of the threshold value in the binarization of the image much easier. To remove the horizontal line between the texts ‘DEMO III’ and ‘Position’, a filter mask (vertical line with a hole at the center) was multiplied to the Fourier transform of the image. This technique is simply linear filtering in the Fourier/frequency domain. The resulting image is already clean, i.e., texts well separated from the background, so no further processing needs to be done. To make the texts one-pixel thick, the thin() function is applied to the image. Looking at the results, it can be observed that the characters in the word 'Position' are not highly resolved probably because the the spacing is too small that they seem to be connected. The failure to extract the letters in their complete form is due to the uneven brightness of the lines forming the letters. Making the texts one-pixel thick worsens their reconstruction. This is because of our failure to extract the letters completely, as well as clearly separate from each other.

Figure 2. Processing of the 'DEMO III, POSITION' text in the image.

The next figure demonstrates the processing of plain handwritten texts in a cropped image of the form. Again, the image is inverted first and then the horizontal lines are removed by linear filtering in the Fourier domain using the same filter mask. Since the binarized image is not yet that clean, it is further processed by applying the morphological operation opening on the binarized image. The texts are then made one-pixel thick. The reconstruction is still not good especially in the one-pixel thick texts. It is not possible to discern what these letters are using the results of our reconstruction. This is mainly because of the indecipherable handwriting of the person writing these texts. Acually, the reconstruction is good if we just look at the original image and not considering that these are supposed to be letters. If we think of the texts as just lines or curves, then we can say that the reconstruction is good. To illustrate how the labeling of the letters could have been done fo perfectly reconstructed texts, Figure 4 is shown below.

Figure 3. Processing of the handwritten texts in the image.

Figure 4. Labeling of the texts.

For this part of the activity, we counted the number of times the word 'DESCRIPTION' appeared in the image. We did it by using the correlation function we have learned in activity 5. The technique to have a good correlation when the word appears is to binarize the image and the template using the same threshold value. The word in the template must also be located at the center to be able to project the correct location of the words in the image of correlation values. Since there are three 'DESCRIPTIONs' in the image, there appear three very bright small dots in the image of the correlation values. Figures 5 and 6 show the location of the words in the image and the location of the bright spots in the image of correlated values, respectively.

Figure 5. Encircled 'DESCRIPTION' in the image.

Figure 6. Instances of the word 'DESCRIPTION' in the image.

I would like to thank Thirdy, Mandy, Jaze, Gilbert, Rommel, Alva and all others who have helped me understand this activity. I would give myself a grade of 10 because I was able to generate the necessary results and explain them.


No comments:

Post a Comment