regex : match two different parts of same string - php

I've got the following string:
{!ex=track_created_f}track_created_f:[NOW/DAY-3MONTHS/DAY TO NOW/DAY]
I would like to match/extract track_created_f and NOW/DAY-3MONTHS/DAY TO NOW/DAY. The {!ex=track_created_f} might or might not be present at all times, so the regex should not rely on this part.
However, it is the second track_created_f (and not the track_created_f which is a part of !ex=track_created_f) which I need to match.
What I've got so far is the following (see this link for live preview):
[^.*(\w+)\:\[(.*)?\]$]
However, this just gives me :
Array
(
[0] => {!ex=track_created_f}track_created_f:[NOW/DAY-3MONTHS/DAY TO NOW/DAY]
[2] => f
[2] => NOW/DAY-3MONTHS/DAY TO NOW/DAY
)
What I'm having trouble to get a real grip on is how I can use regex to match only the part(s) of the string which I'd like to match, and only return that part. As it is now, (0) the entire string is being returned along with (1) the not so good match of track_created_f and (2) the match of NOW/DAY-3MONTHS/DAY TO NOW/DAY.
I've been trying to figure this one out by reading the docs, but I'm uncertain as to whether I'm getting things right - particularly the optional '?' clauses I've put in. Is that the right way to match subsets of strings at all?

[^.*(\w+)\:\[(.*)?\]$] is a wrong regex. You are actually putting whole regex inside a regex character class.
The following regex is enough
/(\w+):\[([^\]]+)/

^(?:{\!ex=\w+}|)(.*):\[(.*)?\]$
That will make the {!ex=track_created_f} part optional.
See: http://www.phpliveregex.com/p/1gc

Related

PHP Regex using preg_split

I need to extract from a string 2 parts and place them inside an array.
$test = "add_image_1";
I need to make sure that this string starts with "add_image" and ends with "_1" but only store the number part at the very end. I would like to use preg_split as a learning experience as I will need to use it in the future.
I don't know how to use this function to find an exact word (I tried using "\b" and failed) so I've used "\w+" instead:
$result = preg_split("/(?=\w+)_(?=\d)/", $test);
print_r($result);
This works fine except it also accepts a bunch of other invalid formats such as:
"add_image_1_2323". I need to make sure it only accepts this format. The last digit can be larger than 1 digit long.
Result should be:
Array (
[0] => add_image
[1] => 1
)
How can I make this more secure?
Following regex checks for add_image as beginning of string and matches _before digit.
Regex: (?<=add_image)_(?=\d+$)
Explanation:
(?<=add_image) looks behind for add_image
(?=\d+$) looks ahead for number which is end of string and matches the _.
Regex101 Demo

PHP preg_split result not correct

I am trying to learn regex in PHP and messing around with the preg_split function.
It doesn't appear to be correct though, or my understanding is completely wrong.
The test code i am using is:
$string = "test ing ";
var_dump(preg_split('/t/', $string));
I would expect to get an array like the following:
[0] => "es" [1] => " ing "
but the following is being returned:
[0] => "" [1] => "es" [2] => " ing "
Why is there an empty string at the start?
I understand that i can use the PREG_SPLIT_NO_EMPTY flag to filter this but it shouldnt be there to begin with. Should it?
Why shouldn't it? This is exactly how it works. The semantics of a split operation are that you have a string of this format:
value-delimiter-value-delimiter-value-...-delimiter-value
(Note that it is starting and ending with a value, not a delimiter.)
So if your string starts with a delimiter, it is absolutely valid to assume that there is an empty value before that delimiter (since the delimiter is supposed to split something into two). You wouldn't generally want to reject the empty string between two consecutive ts either, would you?
And this is exactly what PREG_SPLIT_NO_EMPTY is for. You use it whenever you do want to get rid of those empty strings.
As a simple example why you would want the default behavior, just think of CSV files. You want to split a line at (for example) ;. You usually also want to allow for empty values. Now if the value in your first column was empty (meaning the line will start with ;, and you chopped that first empty string away completely, then suddenly all indices in the resulting array would correspond to different columns. This is why you want to keep those empty strings as well. In many cases you know how many delimiters there are, and hence how many values - and you want to be able to identify which value belongs at which position. Even if some of them are empty.
It's working 100% correct. The first character is a 't', so it's splitting on that 't' first. Before the first 't' there is nothing, so the array result start with an entry of empty string.
It's happening because of the t at the beginning of your string. If you don't use the PREG_SPLIT_NO_EMPTY option, preg_split will treat an empty string as a valid split.
Think of it this way: Everywhere preg_split sees a t, it chops the string into two chunks: the chunk before the t, and the chunk after it. Even if one of the chunks doesn't have anything in it, it still counts. That piece is just an empty string.
For some applications, this would be perfectly useful -- for example, say you wanted to replace each t with something, but the replacement was too complicated to just use preg_replace. The language wants you to be able to choose, so it keeps the empty split unless you explicitly tell it not to with PREG_SPLIT_NO_EMPTY.

Regular expression help needed

Although I can find a lot of tutorials on regular expressions, it remains above my grasp. The regular expression that I want to create is simple (judged by what I see in some of the examples), but I simply can not figure it out.
I want to do a simple replacement as follows:
I have image metadata saved in a MySQL table, with fields: id, name, title and alt.
In my content, I want to write [[IMAGE:1:right]]content here[[image:2:left]].
I want to get the matches of the ID (the digit) and the float (left or right) and replace the entire string with the image floated left or right, retrieved by the ID from the database table.
Here is my attempt:
preg_match("/^\[\[image:(\d+):(left|right)\]\]+/i", "[[IMAGE:1:right]]content here[[image:2:left]]", $matches);
This gives me the return of:
Array ( [0] => [[IMAGE:1:right]] [1] => 1 [2] => right )
So, it finds one, but I want it to find ALL of them, as I may have more than one image in a post. As far as I can tell, the + there should match all entries, and the i should match case insensitive. It appears as if the case insensitive way works, but I get only one return.
Could someone please let me know what I am doing wrong?
That's not quite how it works. That + only applies to the token immediately before it - the ]. You want to make the match global in Perl vernacular, which for PHP (which I think you're using?) means calling the function preg_match_all(). You'll also have to remove the ^, as only one of the images occurs at the beginning of the string.
Also, [ and ] are special characters in regex - so please escape them when you want a literal bracket by writing \[\[ and \]\].

RegEx with character set inside positive lookbehind, Is it possible?

I need to match "name" only after "listing", but of course those words could be any url directory or page.
mydomain.com/listing/name
so the only thing I can "REGuest" (request) is to be some parent directory there.
In other words, I want to match the "position" i.e. whatever comes 2nd after the domain.
I'm trying something like
(?<=mydomain\.com/[^/\?&]+/)[^/\?&]+(?:/)?
But the character set won't work inside the positive lookbehind, at least it's setup to match only ONE character. As soon as I try to match other than one (e.g. modify it with +, ? or *) it just stops working.
I'm obviously missing the positive lookbehind syntax and it seems not intended for what I'm trying.
How can I match that 2nd level filename?
Thanks.
Regular-expressions.info states that
The bad news is that most regex flavors do not allow you to use just
any regex inside a lookbehind, because they cannot apply a regular
expression backwards. Therefore, the regular expression engine needs
to be able to figure out how many steps to step back before checking
the lookbehind...
(Read further, they even mention Perl, Python and Java.)
I think the quantifier might be the problem. I found this on stackoverflow and briefly flew over it.
Wouldn't it be possible to just match the whole path, and use a group for the second level filename:
mydomain\.com\/[^\/\?&]+\/([^\/\?&]+)(?:\/)?
(note: I had to escape the / for my tests...)
The result of this would be something like:
Array
(
[0] => mydomain.com/listing/name
[1] => name
)
Now, because I don't know the context of your problem, I just assumed you would be able to postprocess the results and get the group 1 (index 1) from the result. If not, I unfortunately don't know...

Trying to split a string in 3 variables, but a little more tricky - PHP

Having pretty much covered the basics in PHP, I decided to challenge myself and make a simple calculator. After some attempts I figured it out, but I'm not entirely content with it. I want to make it more user friendly and have the calculator input in just one box, very much like google search.
So one would simply type: 5+2
and recieve 7.
How would I split the string "5+2" into three variables so that the math functions can convert the numbers into integers and recognize the operator, as well as accounting for the possibility of someone using spaces between the values as well?
Would you explode the string? But what would you explode it with if there are no spaces?
I've also stumbled upon the preg_split function, but I can't seem to wrap my head around or know if it's suitable to solve this problemt. What method would be the best option for this?
$calc = "5* 2+ 53";
$calc = preg_replace('/(\s*)/','',$calc);
print_r(preg_split('/([\x28-\x2B\x2D\x2F])/',$calc,-1,PREG_SPLIT_DELIM_CAPTURE));
That's my bid, resulting in
Array
(
[0] => 5
[1] => *
[2] => 2
[3] => +
[4] => 53
)
You may need to use some clever regex to split it something like:
$myOutput = split("(-?[0-9]+)|([+-*/]{1})|(-?[0-9]+)");
I haven't tested that - just an semi-psuedo-ish example sorry :-> just trying to highlight that you will need to remember that your - (minus) operator can appear at the start of an integer to make it a negative number so you could end up with problems with things like -1--21 which is valid but makes your regex rules more complicated.
You will have to split the string using regular expressions.
For example a simple regex for 5+2 would be:
\d\+\d
Check out this link. You can create and validate your regular expressions there. For a calculator it will not be that difficult.
You've got the right idea with preg_split. It would work something like this:
$values = preg_split("/[\s]+/", "76 + 23");
The resulting array will contain values that are NOT whitespace:
Values should look like this:
$values[0]: "76"
$values[1]: "+"
$values[2]: "23"
the "/[\s]+/" is a regular expression pattern that matches any whitespace characters one or more times. Howver, if there are no whitespaces at all, preg_split will just return the original "5+2" as a single string in the first element of the array. i.e.:
$values[0] = "5+2"

Categories