Write a function - Practice #2

◀ Write a function - Practice #1 ▶ Documentation

Amazon Let’s practice writing another useful function from scratch. In java, the utility package supports a nifty class called StringTokenizer, but there is no such class in C++. Many have come up with their own version of tokenizing a string. Let’s see if we are able to write a function to do this task.

Your version of the string tokenizer receives a string which contains data separated by user-defined delimiters (usually a space), then retrieves individual data items, also known as tokens. Here is one possible prototype of the function:

vector<string> tokenize(string s);

Here is a list of specifications of this function:

The function returns a vector<string> that stores all tokens of s.
Delimiters include all unprintable characters.
Make sure none of the tokens contains any unprintable characters.

Let’s write down the preconditions and postconditions in a formal way:

precondition: s contains tokens separated by unprintable characters

postcondition: returns a vector that contains all tokens

Step 1

First we need to come up with a skeleton of the function. Mine looks like:

/*
precondition: s contains tokens separated by unprintable characters
postcondition: returns a vector that contains all tokens of s
*/
vector<string> tokenize(string s){
- eliminate spaces and tabs before and after s, if any
- do the following inside the while loop
	take all characters until an unprintable character is reached
	store it in the vector
	skip all unprintable characters until a printable character is reached
- return the vector
}

Step 2
We need a vector<string> to store the tokens; we probably need one or more indexing variables inside our loops. Now let’s incorporate variables into our skeleton:

/*
precondition: s contains tokens separated by unprintable characters
postcondition: returns a vector that contains all tokens of s
*/
vector<string> tokenize(string s){
	int i, j;
	vector<string> vs;

- eliminate spaces and tabs before and after s, if any
- do the following inside the while loop
	take all characters until an unprintable character is reached
	store it in the vector
	skip all unprintable characters until a printable character is reached
- return the vector 
}

Step 3
Interrelation: If i or j is used as index for vs, it cannot go out of its bounds.

State: As the program runs, vs will contain more and more tokens.

Step 4
To test if a character is printable, use isgraph() defined in . It takes a char argument and returns true if it is a printing character. The library’s isprint() is similar but it returns true if the argument is a space.

This is a live example of reducing code size we discussed earlier this chapter. If you do not use isgraph(), you will need to take care of all unprintable characters by yourself. It is important that you code this function by yourself for practice. After you are done, take a look at my version.

/*
precondition: s contains tokens separated by unprintable characters
postcondition: returns a vector that contains all tokens of s
*/
vector<string> tokenize(string s){
	int slen=s.length();

	int j=-1;
/* eliminate spaces and tabs before and after s, if any */
	while(!isgraph(s[++j]) && j<slen)
;
	s=s.substr(j);
	j=s.length();
	while(!isgraph(s[--j]) && j>=0)
;
	s=s.substr(0,j+1);

/* append a space in the end of temp for next while loop */
	s+=' ';

	int i;
i=j=0;
	vector<string> vs;
	while(j<s.length()){
/* take all characters until an unprintable character is reached */
		while(isgraph(s[++j]) && j<slen)
			;
/* store it in the vector */
		vs.push_back(s.substr(i,j-i));
/* skip all unprintable characters until a printable character is reached */
		while(!isgraph(s[++j]) && j<slen) 
;
		i=j;
	}
/* return the vector */
	return vs;
}

I use <string>’s substr() in this function. You can refer to Chapter 8.7 to learn how to use it. The most annoying thing about writing this function is determining the index of a character in s and the arguments of substr(), which takes a little finesse.

Let me explain how this function works. I assign –1 to j because in the while loop the index is ++j, which increments j first before it is used. That done, the following line

s=s.substr(j);

Copies a substring from index j because s[j] is a printable character.

The next sub task is to get rid of trailing unprintable characters. You shouldn’t have any problem seeing how it works. The next while loop is the heart of the function; it retrieves all tokens and stores them in a vector<string> object.

I use the same logic to get rid of unprintable characters. The reason that I need to append a space to s before entering the while loop is that when skipping all unprintable characters until a printable character is reached, I increment j first. Without a space after the last token, this action will result in an array out-of-bound access.

After several test runs, we see that the function works perfectly. You may feel that the function is a bit too long and there is redundant code. Let’s look at our design again. Why do we need to get rid of unprintable characters first? Can’t we simply enter a while loop which skips unprintable characters and stores a sequence of printable characters in each iteration?

Yes, that’s exactly what I meant when I said, “design of a function is critical to writing a good function.” Now that a better design hits us, we can choose whether to use the old design or the new one.

Bear in mind, though, that the bigger the program, the more time it takes to redo it. That's why you see so much old, deprecated code in a software company's code base as improving it requires a total overhaul.
Since this function is rather small, let’s practice more by building our function on the new design. Here is our new skeleton, including variables:

/*
precondition: s contains tokens separated by unprintable characters
postcondition: returns a vector that contains all tokens of s
*/
vector<string> tokenize(string s){
	int i, j;
	vector<string> vs;

- use a while loop to retrieve all tokens
- do the following inside the while loop until all characters of s are scanned
	skip all unprintable characters until a printable character is reached
	take all characters until an unprintable character is reached
	store it in the vector
- return the vector
}

Let’s go ahead and code it. Here is my code:

/*
precondition: s contains tokens separated by unprintable characters
postcondition: returns a vector that contains all tokens of s
*/
vector<string> tokenize(string s){
  int i,j;
  vector<string> vs;
  int slen=s.length();

  j=0;
/* use a while loop to retrieve all tokens */
  while(j<slen){
/* skip all unprintable characters until a printable character is reached */
    while(!isgraph(s[j++]) && j<slen)
      ;
    if(j>=slen) return vs;
    i=j-1;
/* take all characters until an unprintable character is reached */
    while(isgraph(s[j++]) && j<slen)
      ;
/* store it in the vector */
    if(j>=slen) j++;
    vs.push_back(s.substr(i,j-i-1));
  }
/* return the vector */
  return vs;
}

After testing it several times, we are certain that it works perfectly. As we can see, this version looks shorter and cleaner than the previous one. It is also more efficient.

If we had been dealing with a much larger program, we would have had to spend a lot more time redesigning and recoding. Therefore, you should think more carefully while designing the skeleton of your program.

Try to stick to the cleanest and most efficient design.
Next let’s look at how documentation plays a critical role in maintain your programs!

A lion can sleep up to 20 hours a day.

◀ Write a function - Practice #1 ▶ Documentation

fShare

Questions? Let me know!

Chapter 12.10

Write a function - Practice #2