Question: Question:
Background and questions
I saw the following program.
char *token1 = strtok(str, " ");
char *token2 = strtok(NULL, " ");
char *token3 = strtok(NULL, " ");
if(token1 == NULL) {
// トークンがない場合の処理
}
else if(token2 == NULL) {
// トークンが1個の場合の処理
...
}
else if(token3 == NULL) {
// トークンが2個の場合の処理
...
}
else {
// トークンが3個以上の場合の処理
...
}
This program assumes that strtok()
can return NULL
once and then call strtok(NULL, ...)
in succession.
I had never thought of such a usage, and rather than unknowingly thought it was undefined, is this usage guaranteed by the standard?
Standardization survey
I checked the standard for verification.
Quoted from ISO / IEC 9899: 2011 7.24.5.8 The strtok
function (emphasis added by quoter).
In this quote, s1
and s2
represent the first and second arguments of strtok
, respectively.
3
The first call in the sequence searches the string pointed to bys1
for the first character
that is not contained in the current separator string pointed to bys2
. If no such character
is found, then there are no tokens in the string pointed to bys1
and thestrtok
function
returns a null pointer . If such a character is found, it is the start of the first token.Four
Thestrtok
function then searches from there for a character that is contained in the
if no such character is found, the current token extends to the
end of the string pointed to bys1
, and subsequent searches for a token will return a null
pointer . If such a character is found, it is overwritten by a null character, which
terminates the current token. Thestrtok
function saves a pointer to the following
character , from which the next search for a token will start.Five
Each subsequent call, with a null pointer as the value of the first argument, starts
searching from the saved pointer and behaves as described above.
I thought about this description in the following cases.
Here, the way of writing paragraph 5 seems ambiguous, but for the strtok(NULL, ...)
call, it means that s1
of paragraphs 3 and 4 is replaced with the pointer saved in paragraph 4 and executed. I interpreted it as.
Case 1: If it ends with a token character, such as s1
= "abc def"
The call after "def"
is returned matches the pattern "subsequent searches for a token will return a null pointer" in paragraph 4 and returns NULL
. Since "search" is plural, NULL
is returned no matter how many times strtok(NULL, ...)
is called.
Case 2: When there is a delimiter at the end of the token string, such as s1
= "abc def "
The call after "def"
is returned matches the pattern " strtok
function returns a null pointer" in paragraph 3 and returns NULL
. There is some saved pointer (in this case the end of the string) and there is no update for that pointer, so no matter how many times I call it, it will match the pattern in paragraph 3 and return NULL
.
Case 3: If there is no token column, such as s1
= " "
As in the previous section, NULL
is returned according to the pattern in paragraph 3. In this case, the subsequent strtok(NULL, ...)
is likely to be undefined because there is no such thing as a saved pointer.
As mentioned above, strtok(NULL, ...)
after returning NULL
in some cases returned NULL
, and in other cases it seemed to be undefined.
However, that makes me feel uncomfortable halfway through.
As for how to think about this, I thought about the following two possibilities.
- There is a mistake in the above interpretation, and it is decided whether
NULL
is returned or undefined. - Basically it should be considered undefined. There is no particular meaning to the plural form of "search" in paragraph 4.
However, I couldn't confirm what was right.
experiment
For reference, I experimented with what it would be like in my environment.
environment:
- OS: Gentoo Linux
- Compiler: gcc 4.8.4, clang 3.5.0
- Library: glibc 2.20
program:
#include <stdio.h>
#include <string.h>
// トークンの内容を出力。ただし NULL の場合は "(null)" を出力。
#define PR_TOKEN(token) do { printf(#token " = %s\n", (token)? token: "(null)"); } while(0)
int main(void)
{
// ケース1: 文字列がトークン文字で終わる
char str1[] = "abc def";
char *token1_1 = strtok(str1, " ");
char *token1_2 = strtok(NULL, " ");
char *token1_3 = strtok(NULL, " ");
char *token1_4 = strtok(NULL, " ");
PR_TOKEN(token1_1);
PR_TOKEN(token1_2);
PR_TOKEN(token1_3);
PR_TOKEN(token1_4);
// ケース2: 文字列が区切り文字で終わる
char str2[] = "abc def ";
char *token2_1 = strtok(str2, " ");
char *token2_2 = strtok(NULL, " ");
char *token2_3 = strtok(NULL, " ");
char *token2_4 = strtok(NULL, " ");
PR_TOKEN(token2_1);
PR_TOKEN(token2_2);
PR_TOKEN(token2_3);
PR_TOKEN(token2_4);
// ケース3: トークン列がない
char str3[] = " ";
char *token3_1 = strtok(str3, " ");
char *token3_2 = strtok(NULL, " ");
char *token3_3 = strtok(NULL, " ");
char *token3_4 = strtok(NULL, " ");
PR_TOKEN(token3_1);
PR_TOKEN(token3_2);
PR_TOKEN(token3_3);
PR_TOKEN(token3_4);
return 0;
}
result:
token1_1 = abc
token1_2 = def
token1_3 = (null)
token1_4 = (null)
token2_1 = abc
token2_2 = def
token2_3 = (null)
token2_4 = (null)
token3_1 = (null)
token3_2 = (null)
token3_3 = (null)
token3_4 = (null)
In all cases, the strtok()
call returns NULL
after returning NULL
once.
This time, regarding Case 3, where the validity remains questionable, I suspect that it is working well because the information saved in Case 1 and Case 2 remains, so I only execute Case 3 or Case 2. It was the same when I entered Case 3 immediately after the first strtok()
call.
Answer: Answer:
7
The strtok function returns a pointer to the first character of a token, or a null pointer if there is no token.
Therefore, if there is no corresponding token, you can think of it as returning NULL.
For your reference,
According to the explanation of the 5th edition of the C Reference Manual,
If all characters belong to set, strtok returns a null pointer and sets the internal state pointer to a null pointer.
Therefore, in case 3 as well, the internal pointer is set to null. Is different from the questioner's assumption.
also,
If str and the internal state pointer are both null pointers, strtok returns a null pointer and leaves the internal state pointer as it is. (Even if all the words are returned, it is still in case strtok is called)
(* Str is the first argument set is the second argument)
Because there is
"You can call strtok (NULL, …) after strtok () returns NULL once."
Is often a prerequisite, which means that it returns NULL.