PHP Regex using preg_split - php

I need to extract from a string 2 parts and place them inside an array.
$test = "add_image_1";
I need to make sure that this string starts with "add_image" and ends with "_1" but only store the number part at the very end. I would like to use preg_split as a learning experience as I will need to use it in the future.
I don't know how to use this function to find an exact word (I tried using "\b" and failed) so I've used "\w+" instead:
$result = preg_split("/(?=\w+)_(?=\d)/", $test);
print_r($result);
This works fine except it also accepts a bunch of other invalid formats such as:
"add_image_1_2323". I need to make sure it only accepts this format. The last digit can be larger than 1 digit long.
Result should be:
Array (
[0] => add_image
[1] => 1
)
How can I make this more secure?

Following regex checks for add_image as beginning of string and matches _before digit.
Regex: (?<=add_image)_(?=\d+$)
Explanation:
(?<=add_image) looks behind for add_image
(?=\d+$) looks ahead for number which is end of string and matches the _.
Regex101 Demo

Related

regex to convert string 018v-s001v => 18v-s1v but 020v_001 => 20v_001

I'm struggling with a Regex to convert the following strings
018v-s001v => 18v-s1v
018v-s001r => 18v-s1r
018r-s002v => 18r-s2v
020v_001 => 20v_001
020r_002 => 20r_002
0001 => 0001
I could manage to convert the first three cases but I'm struggling with the latter three: How to preserve the zeros after_ and the all zeros in the last case?
My attempt: (0*)([1-9]{0,4}[vr]?)((-s)?+([0]{0,2}))?+([1-9][vr])?
https://regex101.com/r/2go5KO/1
For your given examples, you could use
000\d+(*SKIP)(*FAIL)|(?<=\b|[a-z])0+
See a demo on regex101.com.
To get the expected result for your example data you might use preg_replace.
You could match one or more times a zero 0+, capture in a group one or more digits and use a character class to match by v or r ([0-9]+[vr])
Regex
0+([0-9]+[vr])
Replace
Captured group 1 $1
Demo Php
How about this one:
$result = preg_replace('/(?:(\d{4})|(0)?(\d{2}\w))(?:([-_])(?:(\d{3})|(\w)(0+)(\d+?\w)))?/m',
'$1$3$4$5$6$8', $subject);
This produces all the results you require from your test strings. But it wasn't clear where a zero definitely will appear or only optionally. But I'm sure it can be adapted. Also I noticed the separator was occasionally a hyphen - and occasionally an underscore _ and it wasn't clear if that was just your typing or was significant. In any case I've assumed it could be either somewhat randomly.

php preg_match_all between ... and

I'm trying to use preg_match_all to match anything between ... and ... and the line does word wrap. I've done number of searches on google and tried different combinations and nothing is working. I have tried this
preg_match_all('/...(.*).../m/', $rawdata, $m);
Below is an example of what the format will look like:
...this is a test...
...this is a test this is a test this is a test this is a test this is a test this is a test this is a test this is a test this is a test...
The s modifier allows for . to include new line characters so try:
preg_match_all('/\.{3}(.*?)\.{3}/s', $rawdata, $m);
The m modifier you were using is so the ^$ acts on a per line basis rather than per string (since you don't have ^$ doesn't make sense).
You can read more about the modifiers here.
Note the . needs to be escaped as well because it is a special character meaning any character. The ? after the .* makes it non-greedy so it will match the first ... that is found. The {3} says three of the previous character.
Regex101 demo: https://regex101.com/r/eO6iD1/1
Please escape the literal dots, since the character is also a regular expressions reservered sign, as you use it inside your code yourself:
preg_match_all('/\.\.\.(.*)\.\.\./m/', $rawdata, $m)
In case what you wanted to state is that there are line breaks within the content to match you would have to add this explicitely to your code:
preg_match_all('/\.\.\.([.\n\r]*)\.\.\./m/', $rawdata, $m)
Check here for reference on what characters the dot includes:
http://www.regular-expressions.info/dot.html
You're almost near to get it,
so you need to update your RE
/\.{3}(.*)\.{3}/m
RE breakdown
/: start/end of string
\.: match .
{3}: match exactly 3(in this case match exactly 3 dots)
(.*): match anything that comes after the first match(...)
m: match strings that are over Multi lines.
and when you're putting all things together, you'll have this
$str = "...this is a test...";
preg_match_all('/\.{3}(.*)\.{3}/m', $str, $m);
print_r($m);
outputs
Array
(
[0] => Array
(
[0] => ...this is a test...
)
[1] => Array
(
[0] => this is a test
)
)
DEMO

regex : match two different parts of same string

I've got the following string:
{!ex=track_created_f}track_created_f:[NOW/DAY-3MONTHS/DAY TO NOW/DAY]
I would like to match/extract track_created_f and NOW/DAY-3MONTHS/DAY TO NOW/DAY. The {!ex=track_created_f} might or might not be present at all times, so the regex should not rely on this part.
However, it is the second track_created_f (and not the track_created_f which is a part of !ex=track_created_f) which I need to match.
What I've got so far is the following (see this link for live preview):
[^.*(\w+)\:\[(.*)?\]$]
However, this just gives me :
Array
(
[0] => {!ex=track_created_f}track_created_f:[NOW/DAY-3MONTHS/DAY TO NOW/DAY]
[2] => f
[2] => NOW/DAY-3MONTHS/DAY TO NOW/DAY
)
What I'm having trouble to get a real grip on is how I can use regex to match only the part(s) of the string which I'd like to match, and only return that part. As it is now, (0) the entire string is being returned along with (1) the not so good match of track_created_f and (2) the match of NOW/DAY-3MONTHS/DAY TO NOW/DAY.
I've been trying to figure this one out by reading the docs, but I'm uncertain as to whether I'm getting things right - particularly the optional '?' clauses I've put in. Is that the right way to match subsets of strings at all?
[^.*(\w+)\:\[(.*)?\]$] is a wrong regex. You are actually putting whole regex inside a regex character class.
The following regex is enough
/(\w+):\[([^\]]+)/
^(?:{\!ex=\w+}|)(.*):\[(.*)?\]$
That will make the {!ex=track_created_f} part optional.
See: http://www.phpliveregex.com/p/1gc

PHP Regex to identify keys in array representation

I have this string authors[0][system:id] and I need a regex that returns:
array('authors', '0', 'system:id')
Any ideas?
Thanks.
Just use PHP's preg_split(), which returns an array of elements similarly to explode() but with RegEx.
Split the string on [ or ] and the remove the last element (which is an empty string) of the provided array, $tokens.
EDIT: Also, remove the 3rd element with array_splice($array, int $offset, int $lenth), since this item is also an empty string.
The regex /[\[\]]/ just means match any [ or ] character
$string = "authors[0][system:id]";
$tokens = preg_split("/[\]\[]/", $string);
array_pop($tokens);
array_splice($tokens, 2, 1);
//rest of your code using $tokens
Here is the format of $tokens after this has run:
Array ( [0] => authors [1] => 0 [2] => system:id )
Taking the most simplistic approach, we would just match the three individual parts. So first of all we'd look for the token that is not enclosed in brackets:
[a-z]+
Then we'd look for the brackets and the value in between:
\[[^\]]+\]
And then we'd repeat the second step.
You'd also need to add capture groups () to extract the actual values that you want.
So when you put it all together you get something like:
([a-z]+)\[([^\]]+)\]\[([^\]]+)\]
That expression could then be used with preg_match() and the values you want would be extracted into the referenced array passed to the third argument (like this). But you'll notice the above expression is quite a difficult-to-read collection of punctuation, and also that the resulting array has an extra element on it that we don't want - preg_match() places the whole matched string into the first index of the output array. We're close, but it's not ideal.
However, as #AlienHoboken correctly points out and almost correctly implements, a simpler solution would be to split the string up based on the position of the brackets. First let's take a look at the expression we'd need (or at least, the one that I would use):
(?:\[|\])+
This looks for at least one occurence of either [ or ] and uses that block as delimiter for the split. This seems like exactly what we need, except when we run it we'll find we have a small issue:
array('authors', '0', 'system:id', '')
Where did that extra empty string come from? Well, the last character of the input string matches you delimiter expression, so it's treated as a split position - with the result that an empty string gets appended to the results.
This is quite a common issue when splitting based on a regular expression, and luckily PCRE knows this and provides a simple way to avoid it: the PREG_SPLIT_NO_EMPTY flag.
So when we do this:
$str = 'authors[0][system:id]';
$expr = '/(?:\[|\])+/';
$result = preg_split($expr, $str, -1, PREG_SPLIT_NO_EMPTY);
print_r($result);
...you will see the result you want.
See it working

Trying to split a string in 3 variables, but a little more tricky - PHP

Having pretty much covered the basics in PHP, I decided to challenge myself and make a simple calculator. After some attempts I figured it out, but I'm not entirely content with it. I want to make it more user friendly and have the calculator input in just one box, very much like google search.
So one would simply type: 5+2
and recieve 7.
How would I split the string "5+2" into three variables so that the math functions can convert the numbers into integers and recognize the operator, as well as accounting for the possibility of someone using spaces between the values as well?
Would you explode the string? But what would you explode it with if there are no spaces?
I've also stumbled upon the preg_split function, but I can't seem to wrap my head around or know if it's suitable to solve this problemt. What method would be the best option for this?
$calc = "5* 2+ 53";
$calc = preg_replace('/(\s*)/','',$calc);
print_r(preg_split('/([\x28-\x2B\x2D\x2F])/',$calc,-1,PREG_SPLIT_DELIM_CAPTURE));
That's my bid, resulting in
Array
(
[0] => 5
[1] => *
[2] => 2
[3] => +
[4] => 53
)
You may need to use some clever regex to split it something like:
$myOutput = split("(-?[0-9]+)|([+-*/]{1})|(-?[0-9]+)");
I haven't tested that - just an semi-psuedo-ish example sorry :-> just trying to highlight that you will need to remember that your - (minus) operator can appear at the start of an integer to make it a negative number so you could end up with problems with things like -1--21 which is valid but makes your regex rules more complicated.
You will have to split the string using regular expressions.
For example a simple regex for 5+2 would be:
\d\+\d
Check out this link. You can create and validate your regular expressions there. For a calculator it will not be that difficult.
You've got the right idea with preg_split. It would work something like this:
$values = preg_split("/[\s]+/", "76 + 23");
The resulting array will contain values that are NOT whitespace:
Values should look like this:
$values[0]: "76"
$values[1]: "+"
$values[2]: "23"
the "/[\s]+/" is a regular expression pattern that matches any whitespace characters one or more times. Howver, if there are no whitespaces at all, preg_split will just return the original "5+2" as a single string in the first element of the array. i.e.:
$values[0] = "5+2"

Categories