International Journal of Pattern Recognition and Artificial Intelligence, Vol. 22, No. 4, pp. 691–710, 2008.
In this paper we propose a technique for
detecting and correcting the skew of text areas in a document. The documents we
work with may contain several areas of text with different skew angles. First,
a text localization procedure is applied based on connected components
analysis. Specifically, the connected components of the document are extracted
and filtered according to their size and geometric characteristics. Next, the
candidate characters are grouped using a nearest neighbour approach to form
words and then based on these words text lines of any skew are constructed. Then,
the top-line and baseline for each text line are estimated using linear
regression. Text lines in near locations, having similar skew angles, are grown
to form text areas. For each text area a local skew angle is estimated and then
these text areas are skew corrected independently to horizontal or vertical orientation.
The technique has been tested to prove its accuracy
and robustness and the acquired results are compared with existing techniques.