PHP PREG_SPLIT on numbers 1-100 - php

I am working on some code to break down the full text of a test that will copied and pasted with the following format:
1. This is question number one.
A. Answer 1
B. Answer 2
C. Answer 3
D. Answer 4
2. This is question number two.
3. This is another question, number three.
45. Ken has uses his money, $353. How much does he have after spending $214.
I am using the following preg_split:
$questions = preg_split("/[0-9]+\./", $_POST[test]);
My problem has come in with questions like #45 where there are numbers in the question itself and they are followed by a period.
I just want to match the numbers 1-100 followed by a period. Eg.
1.
2.
3.
4.
5.
etc

I think it is better to use multiline flag with ^:
$questions = preg_split('/^ *[0-9]+\. +/m', $_POST[test]);

A number between 1 and 100, followed by a period, can be matched by
/\b(?:100|[1-9][0-9]?)\./
but if the actual rule is to match a number at the start of a line, use
/^\d+\./m

You can use preg_match_all() instead:
preg_match_all('~(?:^|\R)[0-9]+\. \K.+~', $_POST['test'], $matches);
$questions = $matches[0];

Use ^ to specify that it's the beginning of the line, using the g and m modifiers to specify global and multiline:
/^[0-9]+\.\s/m

Related

Text transformation: Adding text according to previous occurrence

I have a text with multiple questions in the following format:
Q1
Question text 1?
1. Answer A
2. Answer B (+1p)
3. Answer C
4. Answer D
Q2
Question Text 2?
1. Answer A (+1p)
2. Answer B
3. Answer C (+1p)
4. Answer D
Q3
Question Text 3
1. Answer A
2. Answer B
3. Answer C (+1p)
Correct answers are marked with (+1p).
I'd like to reformat it so that the correct answers are stated in a new line like below:
Q1
Question text 1?
1. Answer A
2. Answer B
3. Answer C
4. Answer D
Answer: B
Q2
Question Text 2?
1. Answer A
2. Answer B
3. Answer C
4. Answer D
Answer: A, C
Q3
Question Text 3
1. Answer A
2. Answer B
3. Answer C
Answer: C
Is this even possible to accomplish in Notepad++?
The magic of regular expression to the rescue:
We need a two step approach,
Append Answer:
Find What: ((\R\d\.\h+Answer\h+[A-Z]+\h?(\(\+1p\))?)+)
Replace With: \1\r\nAnswer:
Check Regular Expression
Click Replace or Replace All
Now we collect the answers:
Find What: Answer ([A-Z])\h\(\+1p\)(.*?Answer: [A-Z ]*)
Replace With: Answer \1\2\1
Check Regular Expression
Check . matches newline
Click Replace or Replace All. Keep clicking, the case with several answers in one block needs as much Replace All as there are answers in the block. Observe the message in the dialogs status bar. It will tell you when you are done.
In the first step, the find tries to match a complete block of answers and capture it in \1. The replacement add just a line after the block.
The second step tries (for each block) to capture the lines from the first (+1p) up to the Answer:. The find is such that (+1p) is not cpatured. The answer char of the answer is captured in \1, the following answers up until the Answer: line are captured in \2 and we append the answer char in '\1' to the 'Answer:' line. (Just do a few finds, to see what is matched, then do a few Replaces to see how it works with block that have several marked answers. You can Undo to replay a replace.)
Sometimes a question thrills you here on SO (aka this can be done somehow...)
As of now, you may have come to the conclusion that this is no easy task for an editor like Notepad++ alone (if not impossible at all), so I thought about a solution in a programming language (in my case PHP with the help of regular expressions) and would like to present it here:
Explanation:
What the code basically does are the following steps:
Look for questions blocks - these are blocks of lines beginning with a digit and a dot, sourrounded by empty lines on each site - and save theirs positions in the original string.
In these lines, try to find marked answers (the pattern (+1p))
Create a new string with the possible answers
The position where the answer string (Answer: ...) needs to be inserted can be calculated by the following equation:
(original offset) + strlen(original string) + strlen(answer_string)
Code:
<?php
$string = 'your original string here';
$regex_questions = '~(?ms)(?:^$\R)(?P<answers>(?:^\d\. Answer [A-E].*?\R)+?)(?:^$\R)(?-ms)~';
# does what is described in point 1.)
preg_match_all($regex_questions, $string, $questions, PREG_OFFSET_CAPTURE);
$regex_answers = '~(?m)^(?:\d\. Answer (?<choice>[A-E]).*?\(\+1p\))$~';
# point 2.)
$offset = 0;
# loops over the questions
foreach ($questions["answers"] as $question) {
preg_match_all($regex_answers, $question[0], $answers);
$answer = "Answer: " . implode(',', $answers["choice"]) ."\n";
# point 3.)
$position = $offset + $question[1] + strlen($question[0]);
# point 4.)
$string = substr_replace($string, $answer, $position, 0);
$offset += strlen($answer);
}
echo $string;
# After every code block there's a string with the appropiate answers
?>
Demo:
Find an online demo on ideone.

Regex for exact number / sting match in between

I want to make a regex where I can find the exact number in between a string.
eg. finding the number 2 in 3, 5, 25, 22,2, 15
What I have is /*,2,*/.
But with this regex it matches 22,25 or just anything with a 2 in it. I want it where only match where the number 2 itself is between the commas or without the commas standing alone.
*Update
Both the number(needle) i look for and string(haystack) where i seek it can vary.
Eg if the number i seek is always 2
I want to find them in 2,3,44,23,22,1 or 3,4,22,5,2 or 2 and i should be able to find one match for each of the group of numbers.
You should probably use boundaries (\b) so a leading/trailing comma isn't required.
/\b2\b/
You should do this instead:
,(\d), #for any single digit
,(2), #for 2 in particular
Demo: http://regex101.com/r/vP6jI1

Match just once with regex

I'm using this regex to mach some words without numbers and it works well
(?:searchForThis|\G).+?(\b[^\d\s]+?\b)
The problem that Regex searching the entire document and not only in the line that contains searchForThis
So if I have 2 times searchForThis it will take them twice
I want to stop it only on that 1st line so it will not search the other lines after
Any help please?
I'm using Regex with php
Example of the problem here: http://www.rubular.com/r/vPhk8VbqZR
In the example you will see :
Match 1
1. word
Match 2
1. worldtwo
Match 3
1. wordfive
Match 4
1. word
Match 5
1. worldtwo
Match 6
1. wordfive
But I need only :
Match 1
1. word
Match 2
1. worldtwo
Match 3
1. wordfive
You will see that it's doing twice
===========Edit for more details as asked ===========================
In my php I have :
define('CODE_REGEX', '/(?:searchForThis|\G(?<!^)).*?(\b[a-zA-Z]+\b)/iu')
Output :
if (preg_match_all(CODE_REGEX, $content, $result))
return trim($result[1][0].' '.$result[1][1].' '.$result[1][2].' '.$result[1][3].' '.$result[1][4].' '.$result[1][5]);
Thank you
You can use this pattern instead:
(?:\A[\s\S]*?searchForThis|\G).*?(\b[a-z]+\b)/iu
or
(?:\A(?s).*?searchForThis|\G)(?-s).*?(\b[a-z]+\b)/iu
To deal with multiple line between the first "searchForThis" and others or the end of the string, you can use this: (with your example string you will obtain "After" and "this".)
(?:\A.*?searchForThis|\G)(?>[^a-z]++|\b[a-z]++\S)*?(?!searchForThis)(\b[a-z]+\b)/ius
Note: in all the three pattern you can replace \A with ^ since the multiline mode is not used. Be carefull with rubular that is designed for ruby regexes: m in ruby = s in php (that is the dotall/singleline mode), m in php is the multiline mode (each start of the line can be matched with ^)
You can make it in two stages :
// get the first line with 'searchForThis'
preg_match('/searchForThis(?<line>.*)\n/m', $text, $results);
$line = $results['line'];
// get every word from this line
preg_match_all('/\b[a-z]+\b/i', $line, $results);
$words = $results[0];
Another way, based on the great Casimir's answer (just for readibility) :
preg_match_all('/(?s:^.*?searchForThis|\G).*?(?<words>\b[a-z]+\b)/iu', $str, $results);
$words = $results['words'];

How to match those numbers?

I have an array of numbers, for example:
10001234
10002345
Now I have a number, which should be matched against all of those numbers inside the array. The number could either be 10001234 (which would be easy to match), but it could also be 100001234 (4 zeros instead of 3) or 101234 (one zero instead of 3) for example. Any combination could be possible. The only fixed part is the 1234 at the end.
I cant get the last 4 chars, because it can also be 3 or 5 or 6 ..., like 1000123456.
Whats a good way to match that? Maybe its easy and I dont see the wood for the trees :D.
Thanks!
if always the first number is one you can use this
$Num=1000436346;
echo(int)ltrim($Num."","1");
output:
436346
$number % 10000
Will return the remainder of dividing a number by 10000. Meaning, the last four digits.
The question doesn't make the criteria for the match very clear. However, I'll give it a go.
First, my assumptions:
The number always starts with a 1 followed by an unknown number of 0s.
After that, we have a sequence of digits which could be anything (but presumably not starting with zero?), which you want to extract from the string?
Given the above, we can formulate an expression fairly easily:
$input='10002345';
if(preg_match('/10+(\d+)/',$input,$matches)) {
$output = $matches[1];
}
$output now contains the second part of the number -- ie 2345.
If you need to match more than just a leading 1, you can replace that in the expression with \d to match any digit. And add a plus sign after it to allow more than one digit here (although we're still relying on there being at least one zero between the first part of the number and the second).
$input='10002345';
if(preg_match('/\d+0+(\d+)/',$input,$matches)) {
$output = $matches[1];
}

PHP preg_match range

I want to use preg_match to match numbers 1 - 21. How can I do this using preg_match? If the number is greater than 21 I don't want to match anything.
example preg_match('([0-9][0-1]{0,2})', 'Johnathan 21');
Copied from comment above:
I suggest matching simply ([0-9]{1,2}) (maybe wrapped in \b, based on input format) and filtering the numeric value later, in PHP code.
See also Raymond Chen's thoughts on the subject.
Literally:
preg_match('~ (1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21)$~', 'Johnathan 21');
But maybe this is more nifty:
preg_match('~ ([1-9]|1[0-9]|2[01])$~', 'Johnathan 21');
You could stablish an incremental filter like this example for TCP ports (altougth includes port number 0), like the previous answer:
preg_match('/^([0-9]{1,4}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[0-2][0-9]|6553[0-5])$/', '62000');

Categories