What I have is a text file with some questions in it. It looks like this, if read using file_get_contents…
*** Question 101 This is the question. (A) Choice a. (B) Choice b. (C) Choice c. (D) Choice d. (E) Choice e. Explanation: the explanation. Question 102 This is the question. (A) Choice a. (B) Choice b. (C) Choice c. (D) Choice d. (E) Choice e. Explanation: the explanation. *** Question 201 This is the question. (A) Choice a. (B) Choice b. (C) Choice c. (D) Choice d. (E) Choice e. Explanation: the explanation. Question 202 This is some instructions for latter questions. [non-question]
This is what it looks like if formatted a bit…
Question 101
This is the question.
(A) Choice a.
(B) Choice b.
(C) Choice c.
(D) Choice d.
(E) Choice e.
Explanation: the explanation.
Question 102
This is the question.
(A) Choice a.
(B) Choice b.
(C) Choice c.
(D) Choice d.
(E) Choice e.
Explanation: the explanation.
Question 201
This is the question.
(A) Choice a.
(B) Choice b.
(C) Choice c.
(D) Choice d.
(E) Choice e.
Explanation: the explanation.
Question 202
This is some instructions for latter questions.
[non-question]
Notes: *** and [non-question] are flags which can be present or not. If [non-question] is present, there are no choices or explanations.
What I want is to able to do this:
preg_match_all($pattern, $source, $matches, PREG_SET_ORDER);
foreach ($matches as $match)
{
// do something with $match['seen_on_exam'] or $match['number'] etc...
}
Of course, this means using parameters such as (?P<seen_on_exam>\*{3}), which I can on simpler cases. The problem is that this pattern is strange. Here’s what I came up with.
(?P<seen_on_exam>\*{3})?
Question
(?P<as_numbered>\d+)
(?P<question_text>\w+)
(\(A\) (?P<choice_a>\w+))?
(\(B\) (?P<choice_b>\w+))?
(\(C\) (?P<choice_c>\w+))?
(\(D\) (?P<choice_d>\w+))?
(\(E\) (?P<choice_e>\w+))?
(Explanation: (?P<explanation>\w+))?
(?P<non_question>\[non\])?
The hard part is accounting for possible whitespace between optional/required parts (the only required things is the text “Question”, the number, and the actual question text. However, every line needs to come through in the match array, leaving non-existant elements blank. I just can’t get the final regex correct. Would somebody mind taking a look at this and help me assemble it?
My final version, which doesn’t work, is this:
/(?P<seen_on_exam>\*{3}\s)?Question\s(?P<as_numbered>\d+)\s(?P<question_text>\w+)\s?(\(A\) (?P<choice_a>\w+))?\s?(\(B\) (?P<choice_b>\w+))?\s?(\(C\) (?P<choice_c>\w+))?\s?(\(D\) (?P<choice_d>\w+))?\s?(\(E\) (?P<choice_e>\w+))?\s?(Explanation: (?P<explanation>\w+))?\s?(?P<non_question>\[non\])?/