Capturing key value pairs from a url string with a regex pattern

Capturing key value pairs from a url string with a regex pattern - php

I'm trying to use regex to parse a string like the below:
/subject=hello±#text=something that may contain\#hello.com or a normal sla/sh±#date=blah/somethingelseI don't want to capture after the first/
into:
subject = hello
text =something that may contain\#hello.com or a normal sla/sh
date = blah
Ideally I'd like to be able to split the string after the first '/' by something like '±#' - and only that combination in that order.
I've looked around and at the minute have the below:
([^/±#,= ]+)=([^±#,= ]+)
But this doesn't match only '±#' - it matches either # or ±.
It also doesn't cope with the escaped #. (Instead i get: text= something that may contain\ ).
Is there a better way to do this?
Thanks

Try this:
(?:\/|(?<=±#))(.*?=.*?)(?:±#|$|\/(?!.*±#))
See live demo
An important part is the negative look ahead after the trailing slash /(?!.*±#) - this means "match a slash, but only if ±# doesn't appear in the input after it".
Given this input:
/subject=hello±#text=something that may contain\#hello.com or a normal sla/sh±#date=blah/somethingelseI don't want to capture after the first/
It produces matches whose group 1 are:
subject=hello
text=something that may contain\#hello.com or a normal sla/sh
date=blah

Related

Regex cant limit search range

I have following problem:
I have a pattern like this:
/(?<=template=")(.*?)(.*\/)/gm
And an text like this:
template="test/widgets/glasgow.phtml"}}
My regex should search for the path infront of my file, i need to cut it out so that it will look at the end like this:
template="glasgow.phtml"}}
That works fine but the problem is that i have sometimes an text that looks like this:
block="core/template" template="test/widgets/getcallus.phtml"}}</p>
It cuts everything out till the </.
This is getting cutted out:
test/widgets/getcallus.phtml"}}</
Instead of:
test/widgets/
I have tried to limit the end with $ but it doesnt do nothing.
I am testing it on regexr.com
https://regexr.com/50hi2

You may use the following pattern:
template="\K[^"\/]*\/[^"\/]*\/
See the regex demo. In PHP, you may get rid of backslashes if you specify another regex delimiter:
$regex = '~template="\K[^"/]*/[^"/]*/~';
Details
template=" - literal text
\K - match reset operator
[^"\/]* - 0 or more chars other than / and "
\/ - a / char
[^"\/]* - 0 or more chars other than / and "
\/ - a / char
It is equal to template="\K(?:[^"\/]*\/){2}, where (?:...){2} repeats the non-capturing group sequence of patterns twice.

Be careful with (.*?)(.*\/)
This pattern corresponds to a REDOS vulnerability. (There are 2^n ways to read the n chars before the last /...
To keep a regex closed to yours, you can use
/(?<=template=")([^"]*?\/)*([^"]*)"/
([^"]*?\/)* reads as many blocks "non / nor " chars followed by /" as possible.
https://regex101.com/r/SMSv5R/2

How can I get all occurrences of this pattern with the regex of PHP?

How can I get, into an array, all occurrences of this pattern 4321[5-9][7-9]{6} but excluding, for example, the occurrences where there is a digit immediately before the value, or immediately after it?
For instance, 43217999999 should be valid but 143217999999 (note the number 1 at the beginning) should not be valid.
As the first example, 432179999991 shouldn't be valid because of the 1 that it has in the end.
The added difficulty, at least for me, is that I have to parse this in whatever position I can find it inside a string.
The string looks like this, literally:
43217999997 / 543217999999 // 43217999998 _ 43217999999a43216999999-43216999999 arandomword 432159999997
As you would be able to note, it has no standard way of separating the values (I marked in bold the values that would make it invalid, so I shouldn't match those)
My idea right now is something like this:
(\D+|^)(4321[5-9][7-9]{6})(\D+|$)
(\D+|^) meaning that I expect in that position the start of the string or at least one non-digit and (\D+|$) meaning that I expect there the end of the string or at least one non-digit.
That obviously doesn't do what I picture in my head.
I also tried do it in two steps, first:
preg_match_all("/\D+4321[5-9][7-9]{6}\D+|4321[5-9][7-9]{6}\D+|4321[5-9][7-9]{6}$/", $input, $outputArray);
and then:
for($cont = 0; $cont < count($outputArray); $cont++) {
preg_match("/4321[5-9][7-9]{6}/", $outputArray[0][$cont], $outputArray2[]);
}
so I can print
echo "<pre>" . print_r($outputArray2, true) . "</pre>";
but that doesn't let me exclude the ones that have a number before the start of the value (5432157999999 for example), and then, I am not making any progress with my idea.
Thanks in advance for any help.

If you literally want to check if there is no digit before or after the match you can use negative look ahead and look behind.
(?![0-9]) at the end means: "is not followed by 0-9"
(?<![0-9]) at the start means: "is not preceded by 0-9"
See this example https://regex101.com/r/6xbmJk/1

Php preg_replace numbers characters

$my_string = '88888805';
echo preg_replace("/(^.|.$)(*SKIP)(*F)|(.)/","*",$,my_string);
This shows the first and last number like thus 8******5
But how can i show this number like this 888888**. (The last 2 number is hidden)
Thank you!
From this: 8******5
To: 888888**

I'm not sure if you have worked on this Regex pattern to do something unique. However, I will provide you with a general one that should fit your question without using your current pattern.
$my_string = '88888805';
echo preg_replace("/([0-9]+)[0-9]{2}$/","$1**",$,my_string);
Explanation:
The ([0-9]+) will match all digits, this could be replaced with \d+, it's between brackets to be captured as we are going to use it in the results.
[0-9]{2} is going to match the last 2 digits, again, it can be replaced with \d{2}, it's outside the brackets because we don't want to include them in the result. the $ after that is to indicate the end of the test, it's optional anyways.
Results:
Input: 88888805
Output: 888888**

echo preg_replace("/(.{2}$)(*SKIP)(*F)|(.)/","*",$my_string);
If it for a uni assignment, you'd probably want to do this. Basically says, don't match if its the last two characters, otherwise match.

I have list of webpage URLs, I just need to strip everything except specific value and ID from it using regex

Suppose I have list of URLs that follow structure below. I need to strip each one out so all thats left is the abcustomerid=12345. How can I do this using regex with notepad ++?
Here's an example of the different variety in each line. I just need to remove everything from each line, but leave the abcustomerid=12345 or whatever value that follows abcustomerid.
/the/stucture/blah.php?timeout=300&abcustomerid=53122&customer=zxyi
/some/other/struct/pagehere.php?today=Thursday&abcustomerid=241&count=54
/blah/blah/tendid.php?abcustomerid=12525
Each line could have anything different around the abcustomerid, but i just need to remove everything and keep the abcustomerid and the value.

This regex should do it.
(?:&|\?)abcustomerid=(\d+)
Usage:
<?php
$string= '/the/stucture/blah.php?timeout=300&abcustomerid=53122&customer=zxyi
/some/other/struct/pagehere.php?today=Thursday&abcustomerid=241&count=54
/blah/blah/tendid.php?abcustomerid=12525';
preg_match_all('~(?:&|\?)abcustomerid=(\d+)~', $string, $output);
print_r($output[1]);
The ?: tells the regex not to capture that group. We don't want to capture that data because it is irrelevant. The () capture the data we are interested in. The \d+ is one or more numbers (the + is the one or more part of it). If it can be any value change that to .+? which will match anything but then you will need an anchor for where it should stop. I'd use (?:&|$), which tells it to capture until the next & or the end of the string if it is multilined you'll need to use the m modifier. http://php.net/manual/en/reference.pcre.pattern.modifiers.php
Output:
Array
(
[0] => 53122
[1] => 241
[2] => 12525
)
Demo:
http://sandbox.onlinephpfunctions.com/code/37a4ddea8c50f98a41ac7d45fec98f5f1f58761f

Here is the RegEx which takes the abcustomerid with its value.
[?&](abcustomerid=\d+)
However, how you are going to 'remove everything' using Notepad++?
You can use this service to do this (there is demo in the end of the answer).
Copy your regex and all your data into Test string form. After it succesfully matches everything, look at Match information window at the middle right of the page. Click Export matches... button and choose plain text.
You will get something like this:
abcustomerid=53122
abcustomerid=241
abcustomerid=12525
Here is the working Demo.

PHP regular expression

i have huge string that i need to separate information. Some parts of it vary and some dont. The difficulty i am facing is that i cant find a symbol or something on which i could get the match i want. So here is the string:
$str = "01;01;283;Póvoa do Vâle do Trigo;15315100 01;01;249;Alcafaz;;;;;;;;;;;3750;011;AGADÃO 01;01;2504;Caselho;;;;;;;;;;;3750;012;AGADÃO _ "15" '' ghdhghg AND IT CONTINUES
so if we look at the first part of the string (01;01;283;Póvoa do Vale do Trigo;15315100), what i want to stay with is:
01;01;283
and remove the rest of the stuff
in every case, but looking at the first example... :
the 01 is always a number never superior to 2 (not 040 or 150505 or 4075)
the same for the next 01 never superior to 2 (not 405 or 1565 or 425)
then the 283 is the number that can be bigger, it varies (it can be 300 or 17581 or 40755794)
essentially in the end i want only the beginning of each part like:
01;01;283
01;01;249
01;01;2504
05,80,104258
94,76,56789124
sorry for any misspelling i am Portuguese
i forget to say that this separated parts will then go to an array! so the regular expression should not match for example like this:
15315100 01;01;249
so i cant use .+ for example
I AM USING PREG_REPLACE

Try this:
/(\d+;\d+;\d+)/
Should work.

Try the following. The regex is in the match_all line.
$str = "***01;01;283***;Póvoa do Vâle do Trigo;15315100 ***01;01;249***;Alcafaz;;;;;;;;;;;3750;011;AGADÃO ";
preg_match_all("/\*\*\*[01][0-9];[01][0-9];[0-9]*\*\*\*.*?/", $str, $matches);
print_r($matches);

((?:\d\d;){2}\d+)
DEMO
And maybe it would be easier to just get everything between ***XXX***
\*([\d;]+)\*
DEMO

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Capturing key value pairs from a url string with a regex pattern - php

Related

Regex cant limit search range

How can I get all occurrences of this pattern with the regex of PHP?

Php preg_replace numbers characters

I have list of webpage URLs, I just need to strip everything except specific value and ID from it using regex

PHP regular expression

Categories

Resources