Auto-correction through regular expression Answered


Does anyone can help me create an autocorrection script if I want to correct the item through a regular expression?

For example:


In my case, "D-AO" has been read, instead of "D-A0". But I want to correct it through regular expression because sometimes the first character is not always "D", it can be any letter from A-Z. And the third character is not always a letter, sometimes it is a number from 0-9.

Can anyone help me with this.

thank you in advance!






  • Avatar
    Bill Mercer

    Regular expressions are great tools, but they can be difficult to create and cryptic to understand. They are most useful in cases where the input you are checking can vary in unpredictable ways, such as strings that can vary in length, or where parts might or might not be present.

    If I understand you correctly, it sounds like you have a string where the first character is always a letter A-Z, the second character is a hyphen, the third character is either a letter A-Z or a digit, and the last character is always a digit.

    For a simple fixed-length string like you describe, a regular expression might be overkill.

    You could use a regular expression to try and correct commonly mistaken characters, but only in those positions where you know the incorrect ones cannot occur. Since both a zero and a letter O are possible in the third position, a regular expression can't help correct errors in that position.

    In your example, the last character is seen as a letter O, and that's not an allowed value. So what you are really looking to do is find any case where a letter O appears in the fourth position, and change it to a zero.

    That's a lot simpler than creating a regex to match the entire string pattern.

    All you really need to do is check the 4th character in the string, and if it's a letter O, change it to a zero. That can be done with a simple string replace function. In vbscript it would look something like this:

    MyString = replace(MyString, "O","0",4,1)

    This looks for a letter O in the 4th position of MyString, and replaces it with a zero, leaving the rest of the string unchanged. There's no need to test the other parts of the string, because this error can't be detected there.

    To be thorough, you might also want to handle other common recognition errors, like converting the letter S to the number 5, or converting a letter I to the number 1.

    If you want a more general solution for cases where each position has specific allowed character types, instead of checking the string after recognition, sometimes it's easier to just divide the string up into chunks and recognize the chunks separately. So instead of recognizing a single string, recognize the individual parts, and concatenate them afterwards. That lets you limit what types of characters can be recognized for each position, which eliminates the possibility of the misrecognition happening in the first place. 

    That said, a regex solution can be used here, will follow up with some more info.




  • Avatar

    This really helped. Thank you so much for the tips and information.

    By the way, while I was waiting for a response on my post, I was able to made a solution by putting the desired regular expression in the Properties>Data>Data Type (as Code)>Edit>Content Settings (as Special)>New>Regular Expression (Uncheck Dictionaries).

    It worked but it can be cryptic as been said on the above comment, especially for a very long and complicated expression.




Please sign in to leave a comment.