AdvancedSubmitter: Passphrase filling
tutorial
Last Updated - January 20 2005 - Read the
generic OCR part and removing gridlines section above it
The brand new version of Advanced Submitter
offers passphrase filling through a sophisticated and user adjustable OCR
(Optical Character Recognition) procedure.
This is a small tutorial on how one can use this new functionality to overcome
the problem of having to manually fill most of passphrases.
Before we begin its important to understand
that passphrase filling is NOT always possible. No current technique can
understand really fuzzy or really blurred images. However, it can work for a
good percentage of current sites and through a training procedure you can
recognize patterns that are not recognizable at first try.
Also, out intention is NOT to provide a spamming machine. On the contrary, we
intent to provide the best possible submitting assistant to help fellow
webmasters concentrate on what is more important: building decent galleries,
promoting their programs and making money.
*** IMPORTANT ***
AdvancedSubmitter will download the needed OCR images in a dedicated image box
so please go to settings->browser and select "Don't Download Images"
As an example I will use FreeHugeMovies. Below is a screenshot of this site loaded inside AdvancedSubmitter

As you can see the program has loaded the passphrase image in a dedicated spot at right-down corner of the program's window. This helps to get better control over it. The passphrase field box is filled with the correct recognized text taken from the image.
How this little miracle happened? Internally is really complex, however from end user prospective is rather simple. The main thing we have to do is create a site specific rule that actually says "Fill the passphrase field X with the recognized text of the image Y". So we need to know X and Y. Its easy to find the image info by right clicking on image area (just at the right of Passphrase text) and choosing properties. Below is the result we got:

So we know that the image's url is "http://www.freehugemovies.com/cgi-bin/autogallery/image.cgi" . For now I will write down only a unique part of this url "image.cgi". Since the image urls can be created dynamically, AdvancedSubmitter uses unique parts of image urls to locate the actual url. So, remember to use only small parts of the url that can be used to spot this url. Don't use something general like "freehugemovies" because most probably there are other images that include this text as well.
Now, lets click on "Site Specific Rules" button. Below is how I constructed the new rule:

So, the field I want to fill is called "phrase" - I just selected it from the drop down box. As Value I use "_IMAGEimage.cgi". The _IMAGE tag is a special new tag that says locate an image that contains the following text ("image.cgi" in our case) and use OCR on it. Thats it!
Advanced OCR Image Handling
There are times that special handling is needed for images. For example our procedure will work on black text over white background. But what happens if the text is white over black background? Or if the letters are in color? Also, what happens if there are edges that distract the image? Can I select to avoid edges?
The answers are yes. The OCR module includes
methods to help you overcome those problems.
Apart the new _IMAGE tag we have added three more tags as follows:
| _NEG | This tag creates a negative image. Useful if your image has white letters on black background |
| _BW | This tag creates a black and white image. Useful if there are many colors in your image and distract recognition |
| _POS[x,y] | This tag crops x,y pixels from upper-left and down-right corner of image |
Here are some examples:
_NEG Example


As you can see on the above images this site has a white on black image. So, in order to successfully apply OCR we had to add the _NEG tag. And the rule became "_NEG_IMAGEsecure_img"
POS and BW example


The above images show how to use _POS and _BW tags. The OCR image is difficult this time, both because is not black-and-white and it includes horizontal and vertical lines that makes really hard to apply satisfactory algorithms. AdvancedSubmitter can easily overcome the problem by using the _BW tag at the rule. This way the program finds and remove horizontal and vertical lines automatically allowing a much more accurate OCR.
GLOBAL OCR Rules
Also, remember that you can add global OCR
rules as well. They are applied to ALL sites so they are a handy tool for mass
OCR.
The two most common and useful rules are the following
phrase = _BW_IMAGEimage.cgi
code = _BW_IMAGEcode.cgi
Training ...
Successful image recognition is based on extensive training. We have made the hard work of training the neural network for many cases. However, its more than possible that more training will be needed as long as unknown sites appear that uses special characters. The new AdvancedSubmitter includes a brand new program that helps with training the OCR module. I will not include much details on how one can use this program but I do include a few screenshots from it :)
A few tips for using the OCR
module are as follows:
- The file that includes the learned OCR network is named ocrdata.ocr at AS
directory.
- If you take a look under main AS directory you will see a new directory called
ocrimages. There you can find the last saved images from each site named after
the site name.
So, open the OCR program.
- You have to load the OCR network. Just press the last button on toolbar or
click on OCR menu, choose Load OCR and select the ocrdata.ocr file.
- Then you need to train the network for a new image. Just Open the image you
want (should be a file under ocrimages directory)
- Click the Book button (the tip it shows is OCR Learn) or go to OCR->Learn OCR
glyphs.
- Go to Learn Tab and press Start. Each character will be presented one-by-one
together with the recognized character on "Estimate" field. If there is a wrong
Estimate just enter a new character on "Glyph is" and press "Save Button"


