AdvancedSubmitter: Passphrase filling tutorial
Last Updated - January 20 2005 - Read the generic OCR part and removing gridlines section above it

The brand new version of Advanced Submitter offers passphrase filling through a sophisticated and user adjustable OCR (Optical Character Recognition) procedure.
This is a small tutorial on how one can use this new functionality to overcome the problem of having to manually fill most of passphrases.

Before we begin its important to understand that passphrase filling is NOT always possible. No current technique can understand really fuzzy or really blurred images. However, it can work for a good percentage of current sites and through a training procedure you can recognize patterns that are not recognizable at first try.
Also, out intention is NOT to provide a spamming machine. On the contrary, we intent to provide the best possible submitting assistant to help fellow webmasters concentrate on what is more important: building decent galleries, promoting their programs and making money.

*** IMPORTANT ***
AdvancedSubmitter will download the needed OCR images in a dedicated image box so please go to settings->browser and select "Don't Download Images"

As an example I will use FreeHugeMovies. Below is a screenshot of this site loaded inside AdvancedSubmitter

As you can see the program has loaded the passphrase image in a dedicated spot at right-down corner of the program's window. This helps to get better control over it. The passphrase field box is filled with the correct recognized text taken from the image.

How this little miracle happened? Internally is really complex, however from end user prospective is rather simple. The main thing we have to do is create a site specific rule that actually says "Fill the passphrase field X with the recognized text of the image Y". So we need to know X and Y. Its easy to find the image info by right clicking on image area (just at the right of Passphrase text) and choosing properties. Below is the result we got:

So we know that the image's url is "http://www.freehugemovies.com/cgi-bin/autogallery/image.cgi" . For now I will write down only a unique part of this url "image.cgi". Since the image urls can be created dynamically, AdvancedSubmitter uses unique parts of image urls to locate the actual url. So, remember to use only small parts of the url that can be used to spot this url. Don't use something general like "freehugemovies" because most probably there are other images that include this text as well.

Now, lets click on "Site Specific Rules" button. Below is how I constructed the new rule:

So, the field I want to fill is called "phrase" - I just selected it from the drop down box. As Value I use "_IMAGEimage.cgi". The _IMAGE tag is a special new tag that says locate an image that contains the following text ("image.cgi" in our case) and use OCR on it. Thats it!

Advanced OCR Image Handling

There are times that special handling is needed for images. For example our procedure will work on black text over white background. But what happens if the text is white over black background? Or if the letters are in color? Also, what happens if there are edges that distract the image? Can I select to avoid edges?

The answers are yes. The OCR module includes methods to help you overcome those problems.
Apart the new _IMAGE tag we have added three more tags as follows:

_NEG This tag creates a negative image. Useful if your image has white letters on black background
_BW This tag creates a black and white image. Useful if there are many colors in your image and distract recognition
_POS[x,y] This tag crops x,y pixels from upper-left and down-right corner of image

Here are some examples:

_NEG Example
 

As you can see on the above images this site has a white on black image. So, in order to successfully apply OCR we had to add the _NEG tag. And the rule became "_NEG_IMAGEsecure_img"

POS and BW example

The above images show how to use _POS and _BW tags. The OCR image is difficult this time, both because is not black-and-white and it includes horizontal and vertical lines that makes really hard to apply satisfactory algorithms. AdvancedSubmitter can easily overcome the problem by using the _BW tag at the rule. This way the program finds and remove horizontal and vertical lines automatically allowing a much more accurate OCR.

GLOBAL OCR Rules

Also, remember that you can add global OCR rules as well. They are applied to ALL sites so they are a handy tool for mass OCR.
The two most common and useful rules are the following

phrase = _BW_IMAGEimage.cgi
code = _BW_IMAGEcode.cgi

Training ...

Successful image recognition is based on extensive training. We have made the hard work of training the neural network for many cases. However, its more than possible that more training will be needed as long as unknown sites appear that uses special characters. The new AdvancedSubmitter includes a brand new program that helps with training the OCR module. I will not include much details on how one can use this program but I do include a few screenshots from it :)

A few tips for using the OCR module are as follows:
- The file that includes the learned OCR network is named ocrdata.ocr at AS directory.
- If you take a look under main AS directory you will see a new directory called ocrimages. There you can find the last saved images from each site named after the site name.

So, open the OCR program.
- You have to load the OCR network. Just press the last button on toolbar or click on OCR menu, choose Load OCR and select the ocrdata.ocr file.
- Then you need to train the network for a new image. Just Open the image you want (should be a file under ocrimages directory)
- Click the Book button (the tip it shows is OCR Learn) or go to OCR->Learn OCR glyphs.
- Go to Learn Tab and press Start. Each character will be presented one-by-one together with the recognized character on "Estimate" field. If there is a wrong Estimate just enter a new character on "Glyph is" and press "Save Button"