Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add font information to result iterator #9

Closed
charlesw opened this issue Mar 13, 2013 · 10 comments
Closed

Add font information to result iterator #9

charlesw opened this issue Mar 13, 2013 · 10 comments

Comments

@charlesw
Copy link
Owner

Font information isn't yet available from the result iterator. This will have two parts to complete this these being:

  1. Add TessResultIteratorWordFontAttributes to BaseApi.cs
  2. Add GetFont method to ResultIterator, not this should return a font class (probably cached by font id).

Also make sure that you free any required resource (check the doc for TessResultIteratorWordFontAttributes).

Signature retrieved from capi.h from https://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-3.02.02-win32-lib-include-dirs.zip

TESS_API const char* TESS_CALL TessResultIteratorWordFontAttributes(
    const TessResultIterator* handle, 
    BOOL* is_bold, BOOL* is_italic, BOOL* is_underlined, BOOL* is_monospace, 
    BOOL* is_serif, BOOL* is_smallcaps, int* pointsize, int* font_id);
@CoderNbrSix
Copy link

Hi Charles
Thanks for your reply. I am working on adding it - pretty easy after
reviewing your code - basicly done except for problem with releasing the
returned "char *" resource. It fails on the second call. I believe it has
to do with the Tesseract Definition is a "const char *". I think it makes
the function a Static, but I need to find my Kernighan & Ritchie C
reference to brush up on the Basics.

Commented out the release for now and have it running. Will test more this
Weekend and see if I can get it to better identify the FONT - calls both
OCR-A and Cambria Fonts "Verdana". I think this may involve teaching
Tesseract about the other fonts. So much to learn as well as re-learn.

Thanks
Walter

On Wed, Mar 13, 2013 at 5:00 AM, Charles Weld [email protected]:

Font information isn't yet available from the result iterator. This will
have two parts to complete this these being:

  1. Add TessResultIteratorWordFontAttributes to BaseApi.cs
  2. Add GetFont method to ResultIterator, not this should return a font
    class (probably cached by font id).

Also make sure that you free any required resource (check the doc for
TessResultIteratorWordFontAttributes).

Signature retrieved from capi.h from
https://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-3.02.02-win32-lib-include-dirs.zip

TESS_API const char* TESS_CALL TessResultIteratorWordFontAttributes(
const TessResultIterator* handle,
BOOL* is_bold, BOOL* is_italic, BOOL* is_underlined, BOOL* is_monospace,
BOOL* is_serif, BOOL* is_smallcaps, int* pointsize, int* font_id);


Reply to this email directly or view it on GitHubhttps://github.com//issues/9
.

@charlesw
Copy link
Owner Author

Hi Walter,
You won't need release (aka destroy) the char* references since Tesseract handles that internally (see: https://code.google.com/p/tesseract-ocr/source/browse/trunk/ccmain/ltrresultiterator.h around line 104) . It should be sufficient to just marshal the char* ptrs to strings then just forget about them a signature something like this should work:

IntPtr TessResultIteratorWordFontAttributes(
    IntPtr handle, 
    ref bool is_bold, ref bool is_italic, ref bool is_underlined, ref bool is_monospace, 
    ref bool is_serif, ref bool is_smallcaps, ref int pointsize, ref int font_id);

The benefit of just having it return an IntPtr here is that you only need to dereference it if required, i.e. the font hasn't been created yet. Implementation wise I was thinking we could maintain a dictionary of Font, referenced by font_id, in the result iterator or even page and only create a new one if it doesn't exist. Pseudo code for this would probably look like this:

  1. Invoke Interop method to get font details form iterator
  2. If font with same id already exists, if it does return it.
  3. Otherwise create new font obj, add it to the dictionary and return it.

You'll of course need to dereference the string ptr using Marshal.PtrToStringAnsi, assuming the return type is null terminated ANSI string.

Assumptions

I've made a few assumptions here mainly the following:

  1. The returned font string is an null terminated ANSI string.
  2. BOOL can be marshalled to a .net System.Boolean type, I'm really not sure about this you may need to change this to an int\byte and manually convert to a boolean (0=false).
  3. The font id uniquely identifies the font.

Good luck!
Charles

@charlesw
Copy link
Owner Author

An possible enhancement might be to use an enum for the font attributes (Bold, Italic, etc), checkout http://stackoverflow.com/questions/8447/enum-flags-attribute for an example of what I'm talking about.

@CoderNbrSix
Copy link

Hi Charles
Just an Update with a question. There was better luck on the Font
identification with Serif fonts. (Now I understand what a Serif font is.)
The Boolean was not an issue, all the languages involved (C++, C#, VB.NET)
are using the same implementation on my Windows 7 - 64 box. Expect it to
be the same on all Windows machines. Have not thought about this level of
testing, and so far, am only working on the .NET 4.5. Will have to make my
changes to the other .NET versions and figure out how to test them. Once I
was happy I moved right on to the question of doing separate regions. Saw
that "Process" allowed for a Region specification (Rectangle) I then tested
it. Only one Process call per tesseract instance is permitted. Believing
it will too slow to create new tesseract instances for each region I began
working on the "SetRectangle". This I added to the tesseract.Page.
Looking OK so far. But I did get tied up in the coding and testing and not
working on GitHub stuff. I will work on GitHub so I can start sharing my
changes and the VB.NET test program.

Regarding your suggestions for the Font change, I was looking to only
expose the functionality of tesseract. I believe the way it is working now
should suffice. I am not clear on why you want to cache the Font Names to
the Font_id's? Both are available from the call to "WordFontAttributes";
the Name as the returned string from the function call, and the ID along
with the details about the font type are returned arguments of the call. I
will review your Email some more. For now here are the code segments :

IN RESULTITERATOR.CS: (after GetText)
// Added by Walter 3/16/2013
public String WordFontAttributes(out bool is_bold, out bool
is_italic, out bool is_underlined, out bool is_monospace, out bool
is_serif, out bool is_smallcaps, out int pointsize, out int font_id)
{
return Interop.TessApi.ResultIteratorWordFontAttributes(handle,
out is_bold, out is_italic, out is_underlined, out is_monospace, out
is_serif, out is_smallcaps, out pointsize, out font_id);
}

IN PAGE.CS: (after GetMeanConfidence)
// Walter 3/22/2012 - Set Separate Regions without re-calling
Process (which can be called only once)
public void SetRectangle(int left, int top, int width, int height)
{
Interop.TessApi.BaseApiSetRectangle(Engine.Handle, left, top,
width, height);
Recognize();
}

IN BASEAPI.CS: (after ResultIteratorGetConfidence)
public static string ResultIteratorWordFontAttributes(IntPtr
handle, out bool is_bold, out bool is_italic, out bool is_underlined, out
bool is_monospace, out bool is_serif, out bool is_smallcaps, out int
pointsize, out int font_id)
{
IntPtr txtHandle =
ResultIteratorWordFontAttributesInternal(handle, out is_bold, out
is_italic, out is_underlined, out is_monospace, out is_serif, out
is_smallcaps, out pointsize, out font_id);
if (txtHandle != IntPtr.Zero)
{
var result = MarshalHelper.PtrToString(txtHandle,
Encoding.UTF8);
//TessApi.DeleteText(txtHandle); // Crashing here on second
pass ?
return result;
}
else
{
return null;
}
}

    [DllImport(Constants.TesseractDllName, CallingConvention =

CallingConvention.Cdecl, EntryPoint =
"TessResultIteratorWordFontAttributes")]
public static extern IntPtr
ResultIteratorWordFontAttributesInternal(IntPtr handle, out bool is_bold,
out bool is_italic, out bool is_underlined, out bool is_monospace, out bool
is_serif, out bool is_smallcaps, out int pointsize, out int font_id);

I will work out how to use GitHub before getting lost in the act of coding
again.
Regards and thanks
Walter

On Wed, Mar 13, 2013 at 5:00 AM, Charles Weld [email protected]:

Font information isn't yet available from the result iterator. This will
have two parts to complete this these being:

  1. Add TessResultIteratorWordFontAttributes to BaseApi.cs
  2. Add GetFont method to ResultIterator, not this should return a font
    class (probably cached by font id).

Also make sure that you free any required resource (check the doc for
TessResultIteratorWordFontAttributes).

Signature retrieved from capi.h from
https://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-3.02.02-win32-lib-include-dirs.zip

TESS_API const char* TESS_CALL TessResultIteratorWordFontAttributes(
const TessResultIterator* handle,
BOOL* is_bold, BOOL* is_italic, BOOL* is_underlined, BOOL* is_monospace,
BOOL* is_serif, BOOL* is_smallcaps, int* pointsize, int* font_id);


Reply to this email directly or view it on GitHubhttps://github.com//issues/9
.

@charlesw
Copy link
Owner Author

charlesw commented Apr 2, 2013

There are two main reasons I was thinking of using a cached font obj these are in no real order:

  1. Aesthetics; People from C# and other object oriented languages are probably more use to using objects instead of more procedural code. I've already done this in several place such as creating a Color and Rectangle classes.
  2. Performance; The current implementation you've described above will create a .net string representation every time it's called even if its C representation is cached by Tesseract, this wouldn't be an issue if we're only calling it a few times but it can be called for every single word or perhaps every symbol.

As I previously mentioned previously I'm assuming the font's are uniquely identified by font id.

@forty2
Copy link
Contributor

forty2 commented Apr 19, 2017

Hi all,

I recently had a use for this feature, so I took a stab at implementing it (forty2/tesseract@fadf20d)

Shortly thereafter, I came across pull request #275 from last June, which covers some of the same territory, plus more.

The primary difference difference between that code and mine is that I went ahead and implemented the suggested cached font objects rather than directly exposing the API with its 8-ish "out" params. Still, I didn't feel right about opening a second pull request for the same feature without at least some discussion. So, what can I do to help get this feature merged in in some form or another?

Thanks!
Zach Bean

@charlesw
Copy link
Owner Author

Hi Zach,
In regards to the push request feel free to push away, though I can't make any promises but the functionality sounds good 🙂

As for #275, well to be honest it just fell by the wayside :(
Charles

@charlesw
Copy link
Owner Author

I left some thoughts as noted on your commits.

Also after actually looking at the implementation I don't think the cache in this case would be giving you much since your effectively marshalling all the font information each time anyway. While you could change it to only Marshall the font name if it's not in the cache I'm not sure it's worth it. It's also something that could be added retrospectively if required without any API changes.

@charlesw
Copy link
Owner Author

Sorry that's as notes not noted.

@charlesw
Copy link
Owner Author

Implemented but pr #342, thanks go to @forty2

Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants