New in V7R5 – Part 3. %CHARCOUNT, %LEFT, %RIGHT, %UPPER, %LOWER

Hello everyone and welcome to part 3 of V7R5 News. In this article I am going to show five new built-in functions that I think you may find useful. Of course there will be a program to support the use of this built-in functions and as always, you can copy and paste the code to your environment and test it.

The built-in functions I’ve chosen are %CHARCOUNT%LEFT, %RIGHT, %UPPER and %LOWER. I talked about this at the last Common Iberia Congress in Zaragoza and the feedback was that people wanted to know more about this, so here it is. I have grouped them all because they all have something in common, they all operate on character strings and have the possibility of specifying whether you want to work with natural characters or standard character size. Do you want to know what the difference is? Keep reading. 

Double byte characters – What is the problem?

When you use string data you can have characters of different sizes. I will take UTF-8 as example cause it can have characters of 1, 2, 3 and 4 bytes length. The character ‘a‘ has 1 byte and the character ‘á‘ has 2 bytes and this can lead to errors when you use built-in functions such as %SUBSTR, %SCAN or the newest %LEFT or %RIGHT.

In my native language, Spanish, vowels can be a, e, iou or they can be áéíóú. The first group are 1 byte length characters but the second one are 2 bytes length characters. So what is the length of the Spanish name José Luís? How can be managed in a program? Does the  %len built in function works? What if I want the first part of the string, José, in one variable and the other one, Luís, in another? Is it ok to search with %SCAN for the blank character and then use %SUBST?

Let’s try the above built in functions in a program using /charcount natural and using /charcount stdcharsize and let’s see the difference.

First of all I will explain how every built in function works and then I will show a demo program and the results.

/charcount stdcharsize vs /charcount natural

/charcount is a new compile directive that has one parameter that can be stdcharsize or natural and that is related with calculations operations of strings.

With /charcount stdcharsize you are telling the program to handle string data by bytes if it is alphanumeric or by double bytes if it is UCS2 or graphic.

With /charcount natural you are telling the program to handle string data by natural size.

SUBST_NAT:

				
					**free
ctl-opt charcounttypes(*utf8);

dcl-s var1       varchar(50) ccsid(*utf8);
dcl-s pos        packed(2);
dcl-s firstName  varchar(20);
dcl-s middleName varchar(20);

var1 = 'José Luís';

/charcount natural
snd-msg 'CHARCOUNT(*NATURAL)' %target(*self:2);
pos = %scan(%ucs2(' '):%ucs2(var1));
snd-msg 'Pos for '' '' char is: ' + %char(pos) %target(*self:2);
snd-msg 'Value of var1: ' + var1 %target(*self:2);
firstName = %subst(var1:1:pos-1);
snd-msg 'First name is: ' + %char(firstName) %target(*self:2);
middleName = %subst(var1:pos+1);
snd-msg 'Middle name is: ' + %char(middleName) %target(*self:2);

/charcount stdcharsize
snd-msg 'CHARCOUNT(*STDCHARSIZE)' %target(*self:2);
pos = %scan(%ucs2(' '):%ucs2(var1));
snd-msg 'Pos for '' '' char is: ' + %char(pos) %target(*self:2);
snd-msg 'Value of var1: ' + var1 %target(*self:2);
firstName = %subst(var1:1:pos-1);
snd-msg 'First name is: ' + %char(firstName) %target(*self:2);
middleName = %subst(var1:pos+1);
snd-msg 'Middle name is: ' + %char(middleName) %target(*self:2);

*inlr = *on; 
				
			

Explanation:

Line 1: Code will be fully free.

Line 2: To be able to use /charcount compile directive it is mandatory to type charcounttypes control specification to specify the types of data that are processed by characters rather than by bytes or double bytes when CHARCOUNT NATURAL.

Line 4: Variable to store a string value.

Line 5: Variable to store the position of the character searched with %scan.

Line 6: Variable to store the first name of var1.

Line 7: Variable to store the middle name of var1.

Line 9: var1 contains a string with some double byte characters. In this case the Spanish name José Luís.

Line 11: Using the compiler directive /charcount natural we are specifying that the count of the characters must be done for natural characters. 

Line 12: Sends a message to the job log and to the last line of the screen showing that the charcount is NATURAL.

Line 13: Obtain the position of the blank character to be used in the %subst built-in function. As ccsid of var 1 is utf8 %ucs2 built-in function is used.

Line 14: Sends a message to the job log and to the last line of the screen with a message showing the value of the pos variable.

Line 15: Sends a message to the job log and to the last line of the screen with a message showing the value of the length in bytes of var1.

Line 16: Obtains the first name using the position of the blank character.

Line 17: Sends a message to the job log and to the last line of the screen with a message showing the value of firstName variable.

Line 18: Obtains the middle name using the position of the blank character.

Line 19: Sends a message to the job log and to the last line of the screen with a message showing the value of middleName variable.

Line 21: Using the compiler directive /charcount stdcharsize we are specifying that the count of the characters must be done for its length in bytes.

Line 22: Sends a message to the job log and to the last line of the screen showing that the charcount is STDCHARSIZE.

Lines 23-29: It is exactly the same code as above but for CHARCOUNT(*STDCHARSIZE)

Looking at the job log after running the program we can see that when *NATURAL is used all works fine but when *STDCHARSIZE is used (which is the default option) a conversion error appears when the program tries to extract the first name. The program uses %ucs2 with the %scan built-in function so the position of the blank character is correctly returned in both cases but when charcount(*stdcharsize) is used position 5 is not the blank position or the fifth byte because ‘José‘ is 6 bytes long as ‘é‘ character is 2 bytes long so we get a conversion error.

Let’s play with the program and with the string and try to get the first 3 characters of the work ‘Árbol‘ (tree in English). Let’s see what’s happening with *NATURAL and *STDCHARSIZE.

SUBST_NAT2:

				
					**free
ctl-opt charcounttypes(*utf8);

dcl-s var1       varchar(50) ccsid(*utf8);
dcl-s firstThree varchar(3);

var1 = 'Árbol';

/charcount natural
snd-msg 'CHARCOUNT(*NATURAL)' %target(*self:2);
snd-msg 'Value of var1: ' + var1 %target(*self:2);
firstThree = %subst(var1:1:3);
snd-msg 'First three characters are: ' + %char(firstThree) %target(*self:2);

/charcount stdcharsize
snd-msg 'CHARCOUNT(*STDCHARSIZE)' %target(*self:2);
snd-msg 'Value of var1: ' + var1 %target(*self:2);
firstThree = %subst(var1:1:3);
snd-msg 'First three characters are: ' + %char(firstThree) %target(*self:2);

*inlr = *on; 
				
			

Explanation:

Line 1: Code will be fully free.

Line 2: To be able to use /charcount compile directive it is mandatory to type charcounttypes control specification to specify the types of data that are processed by characters rather than by bytes or double bytes when CHARCOUNT NATURAL.

Line 4: Variable to store a string value.

Line 5: Variable to store the first three characters of var1.

Line 7var1 contains ‘Árbol‘ (tree in English).

Line 9: Using the compiler directive /charcount natural we are specifying that the count of the characters must be done for natural characters. 

Line 10: Sends a message to the job log and to the last line of the screen showing that the charcount is NATURAL.

Line 11: Sends a message to the job log and to the last line of the screen with a message showing the value of the var1 variable.

Line 12: Obtains the first three characters of var1.

Line 13: Sends a message to the job log and to the last line of the screen with a message showing the value of firstThree variable.

Line 21: Using the compiler directive /charcount stdcharsize we are specifying that the count of the characters must be done for its length in bytes.

Line 16: Sends a message to the job log and to the last line of the screen showing that the charcount is STDCHARSIZE.

Lines 17-19: It is exactly the same code as above but for CHARCOUNT(*STDCHARSIZE).

Looking at the job log after running the program we can see that when *NATURAL is used all works fine but look when *STDCHARSIZE is used, you get no conversion error but the result is not what’s expected to be. Only the first 2 characters are returned because of the count of characters using bytes. Character ‘Á‘ is 2 bytes long.

%CHARCOUNT

This built-in function returns the number of natural characters as opposed to %LEN that returns the number of bytes.

%charcount(string)

 

CHARCOUNT:

				
					**free

dcl-s var1       varchar(50) ccsid(*utf8);
dcl-s len_var1   packed(2);
dcl-s count_var1 packed(2);

var1 = 'José Luís';
len_var1 = %len(var1);
count_var1 = %charcount(var1);

snd-msg 'Value of var1: ' + var1 %target(*self:2);
snd-msg 'Length of var1: ' + %char(len_var1) %target(*self:2);
snd-msg 'Char count of var1: ' + %char(count_var1) %target(*self:2);

*inlr = *on; 
				
			

Explanation:

Line 1: Code will be fully free.

Line 3: Variable to store a value with a ccsid of 1208.

Line 4: Variable to store the length of the content of var1.

Line 5: Variable to store the value of the character count of var1.

Line 7: var1 contains a string with some double byte characters. In this case the Spanish name José Luís.

Line 8: Count in bytes of var1

Line 9: Count in natural characters of var1.

Line 11: Sends a message to the job log and to the last line of the screen with a message showing the value of var1.

Line 12: Sends a message to the job log and to the last line of the screen with a message showing the value of the length in bytes of var1.

Line 13: Sends a message to the job log and to the last line of the screen with a message showing the value of the count of the characters of var1.

Looking at the job log after running the program:

%LEFT & %RIGHT

These built-in functions returns N characters from the left (from the beginning to the right) or from the right (from the end to the left) of a string variable. The extraction can be from the beginning of the string or from a specified position. It is specially useful to get the last N characters of a string without the need to calculate the ‘from’ position in a substring.

These built-in functions have an additional parameter in which you can specify if you want to work with *NATURAL of with *STDCHARSIZE which is the default. %SUBST, %SCAN and other built-in functions have been updated to use this new parameter.

%left(string : length { : *NATURAL | *STDCHARSIZE } )

%right(string : length { : *NATURAL | *STDCHARSIZE } )

LEFT:

				
					**free

dcl-s var1       varchar(30) ccsid(*utf8);
dcl-s var2       like(var1);

var1 = 'Rubén Martínez';

var2 = %left(var1:5);
snd-msg '%left without *NATURAL...' %target(*self:2);
snd-msg ('First 5 positions of var1: ''' + var2 + '''') %target(*self:2);

var2 = %left(var1:5:*natural);
snd-msg '%left with *NATURAL...' %target(*self:2);
snd-msg ('First 5 positions of var1: ''' + var2 + '''') %target(*self:2);

*inlr = *on;
				
			

Explanation:

Line 1: Code will be fully free.

Line 3: Variable to store a value with a ccsid of 1208.

Line 4: Variable to store the first 5 characters of var1.

Line 6: var1 contains a string with some double byte characters. In this case the Spanish name Rubén Martínez.

Line 8: var2 contains the first 5 characters of  var1. In this case %left has no start position so by default begins at the first character. In addition *NATURAL is not specified so *STDCHARSIZE is used as it is the default. 

Line 9: Sends a message to the job log and to the last line of the screen with a message showing the use of %left.

Line 10: Sends a message to the job log and to the last line of the screen with a message showing the value of var2, that is, the first 5 positions of var1.

Line 12: var2 contains the first 5 characters of  var1. In this case %left has no start position so by default begins at the first character. In addition *NATURAL is specified so it will use natural count of characters.

Line 13: Sends a message to the job log and to the last line of the screen with a message showing the use of %left.

Line 14: Sends a message to the job log and to the last line of the screen with a message showing the value of var2, that is, the first 5 positions of var1.

Looking at the job log after running the program:

RIGHT:

				
					**free

dcl-s name   varchar(30) ccsid(*utf8);
dcl-s last   like(name);
dcl-s length packed(2);

name = 'Rubén Martínez';

// I want the last 8 characters
length = 8;

last = %subst(name:%charcount(name)-(length-1):length:*natural);
snd-msg 'Last eight characters with %subst:' %target(*self:2);
snd-msg 'Expression used: %subst(name:%charcount(name)-(length-1):length:*natural)'
                                                        %target(*self:2);
snd-msg 'name: ' + last %target(*self:2);

last = %right(name:length:*natural);
snd-msg 'Last eight characters with %right:' %target(*self:2);
snd-msg 'Expression used: %right(name:length:*natural)' %target(*self:2);
snd-msg 'name: ' + last %target(*self:2);

*inlr = *on;
 
				
			

Explanation:

Line 1: Code will be fully free.

Line 3: Variable to store a value with a ccsid of 1208.

Line 4: Variable to store the last 8 characters of name.

Line 5: Variable to store how many characters I want to extract.

Line 7: name contains a string with some double byte characters. In this case the Spanish name Rubén Martínez.

Line 10: Assign the number of characters I want to extract.

Line 12: Using %susbts to extract the last 8 characters requires to calculate from which character I want to begin to extract. In this case I obtain the total number of characters with %charcount that always returns the natural value and I subtract the number of characters I want to extract minus one. This will be the start position.

Line 13-15: Sends a message to the job log and to the last line of the screen with a message showing the operation and the expression that will be run.

Line 16: Sends a message tho the job log and to the last line of the screen showing the result.

Line 18: Using the %right built-in function I have not to calculate anything. I just specify how many characters I want to extract and that’s all.

Line 19-21: Work the same as the snd-msg from line 13 to line 16.

Looking at the job log after running the program:

%UPPER & %LOWER

%UPPER built-in function returns the string passed as parameter in uppercase. It can be the entire string or just a part of it.

%LOWER built-in function returns the string passed as parameter in lowercase. It can be the entire string or just a part of it.

%lower(string {: start { : length { : *NATURAL | *STDCHARSIZE } } })

%upper(string {: start { : length { : *NATURAL | *STDCHARSIZE } } })

This time the example has no double-byte characters.

UPPERLOWER:

				
					**free

dcl-s var1       varucs2(30);
dcl-s var2       like(var1);
dcl-s var3       like(var1);
dcl-s var4       like(var1);

var1 = 'bruce dickinson';
var2 = %upper(var1:1:1);
snd-msg '%upper(' + var1 + ':1:1) => ' + var2 %target(*self:2);

var3 = %upper(var2:%scan(%ucs2(' '):var2) + 1:1);
snd-msg '%upper(' + var2 + ':%scan(%ucs2('' ''):var2) + 1:1) => ' + var3 %target(*self:2);

var4 = %lower(var3);
snd-msg '%lower(' + var3 + ') => ' + var4 %target(*self:2);

*inlr = *on; 
				
			

Explanation:

Line 1: Code will be fully free.

Lines 3-6: Variables to store a ucs2 value.

Line 8: Assign a string to var1.

Line 9: Convert the first character of the string to uppercase. Initial position and length are specified in the second and third parameter.

Line 10: Sends a message to the job log and to the last line of the screen with a message showing the value of var2.

Line 11: In this case the conversion must be done to the first letter of the last name so using the %scan built-in function to search for a blank space in var2 the position is get. Adding one to the position obtained we are right on the first letter of the last name. The third parameter specifies that the conversion must be done only in one character.

Line 13: Sends a message to the job log and to the last line of the screen with a message showing the value of var3..

Line 14: Now to lowercase everything you can use the built-in function %lower without parameters.

Line 12: Sends a message to the job log and to the last line of the screen with a message showing the value of the length in bytes of var4.

Looking at the job log after running the program:

Final thoughts

All of these built-in functions can be used with *NATURAL or *STDCHARSIZE coded as a parameter so there is no need to use a compile directive that affects all code. All of them can be used with any kind of string variable, char, varchar, ucs2, varucs2 and so on. Try the programs with another strings without double-byte characters and see what happens.

Stay tuned and comment if you want. I really appreciate your comments.

Leave a Reply

Your email address will not be published. Required fields are marked *