How can use a match in the same regex in php? - php

I have this string (that is a serialized variable in php):
s:12:"hello "world";
and I wanna to find "hello "world" only with regex, I try this, but seems it is stupid :P
(s:(?P<num>[0-9]+):".{\k{num}}";)
I only want to know how I can use "num" result in the its regex?
this regex is used in a big regex so I can't check for end of string.
thanks advance!

You can use your named capturing groups as backreference like this
Back references to the named subpatterns can be achieved by (?P=name)
or, since PHP 5.2.2, also by \k or \k'name'. Additionally PHP
5.2.4 added support for \k{name} and \g{name}.
According to php.net
But I think this can be used only to match the found pattern again, but not as a number in a quantifier. (At least I didn't got it to work.)

You can use preg_match function, which will populate an array of matches:
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches1 will have the text that matched the first captured parenthesized subpattern, and so on.
More information about preg_match: PHP: preg_match

$text = 's:12:"hello "world";s:12:"good bue world";';
$pattern = "(.*:[0-9]+:\"(.*)\";.*)U";
preg_match_all($pattern,$text,$r);

Related

Find all hashtags in string using preg_match_all

I'm having problems figuring out the right regex pattern for the search preg_match_all("THIS PART", $my_string). I need to find all hashtags in my string with the word after the hashtag included as well.
So, these strings should be found by the mentioned function:
Input
#hi im like typing text right here hihih #asdasdasdasd #
Result
#hi
#asasdasdasdasd
Input
#asd#asd xd so fun lol #lol
Result
#asd#asd2 would be two seperate matches and #lol would be matched aswell.
I hope the question made sense and thanks beforehand!
This should work:
/#(?<hash>[^\s#]+)/g
It searches for # and creates then a named group called hash, it stops matching after it reaches another # or after any whitespace character (\s).
You can use preg_match_all
preg_match_all('/(?<!\w)#\w+/', $your_string, $allMatches);
It will give all contain # tag word. hope it help you.
print_r($allMatches)

How to get a number from a html source page?

I'm trying to retrieve the followed by count on my instagram page. I can't seem to get the Regex right and would very much appreciate some help.
Here's what I'm looking for:
y":{"count":
That's the beginning of the string, and I want the 4 numbers after that.
$string = preg_replace("{y"\"count":([0-9]+)\}","",$code);
Someone suggested this ^ but I can't get the formatting right...
You haven't posted your strings so it is a guess to what the regex should be... so I'll answer on why your codes fail.
preg_replace('"followed_by":{"count":\d')
This is very far from the correct preg_replace usage. You need to give it the replacement string and the string to search on. See http://php.net/manual/en/function.preg-replace.php
Your second usage:
$string = preg_replace(/^y":{"count[0-9]/","",$code);
Is closer but preg_replace is global so this is searching your whole file (or it would if not for the anchor) and will replace the found value with nothing. What your really want (I think) is to use preg_match.
$string = preg_match('/y":\{"count(\d{4})/"', $code, $match);
$counted = $match[1];
This presumes your regex was kind of correct already.
Per your update:
Demo: https://regex101.com/r/aR2iU2/1
$code = 'y":{"count:1234';
$string = preg_match('/y":\{"count:(\d{4})/', $code, $match);
$counted = $match[1];
echo $counted;
PHP Demo: https://eval.in/489436
I removed the ^ which requires the regex starts at the start of your string, escaped the { and made the\d be 4 characters long. The () is a capture group and stores whatever is found inside of it, in this case the 4 numbers.
Also if this isn't just for learning you should be prepared for this to stop working at some point as the service provider may change the format. The API is a safer route to go.
This regexp should capture value you're looking for in the first group:
\{"count":([0-9]+)\}
Use it with preg_match_all function to easily capture what you want into array (you're using preg_replace which isn't for retrieving data but for... well replacing it).
Your regexp isn't working because you didn't escaped curly brackets. And also you didn't put count quantifier (plus sign in my example) so it would only capture first digit anyway.

preg_replace with Regex - find number-sequence in URL

I'm a regex-noobie, so sorry for this "simple" question:
I've got an URL like following:
http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx
what I'm going to archieve is getting the number-sequence (aka Job-ID) right before the ".aspx" with preg_replace.
I've already figured out that the regex for finding it could be
(?!.*-).*(?=\.)
Now preg_replace needs the opposite of that regular expression. How can I archieve that? Also worth mentioning:
The URL can have multiple numbers in it. I only need the sequence right before ".aspx". Also, there could be some php attributes behind the ".aspx" like "&mobile=true"
Thank you for your answers!
You can use:
$re = '/[^-.]+(?=\.aspx)/i';
preg_match($re, $input, $matches);
//=> 146370543
This will match text not a hyphen and not a dot and that is followed by .aspx using a lookahead (?=\.aspx).
RegEx Demo
You can just use preg_match (you don't need preg_replace, as you don't want to change the original string) and capture the number before the .aspx, which is always at the end, so the simplest way, I could think of is:
<?php
$string = "http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx";
$regex = '/([0-9]+)\.aspx$/';
preg_match($regex, $string, $results);
print $results[1];
?>
A short explanation:
$result contains an array of results; as the whole string, that is searched for is the complete regex, the first element contains this match, so it would be 146370543.aspx in this example. The second element contains the group captured by using the parentheeses around [0-9]+.
You can get the opposite by using this regex:
(\D*)\d+(.*)
Working demo
MATCH 1
1. [0-100] `http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-`
2. [109-114] `.aspx`
Even if you just want the number for that url you can use this regex:
(\d+)

using preg_match_all to find patterns, don't include pattern deliminator in matchs

I'm matching patterns with reg_ex as in
$Structure = 'C:N:X:A:V:T:J:N:G:T:N:N:C:J:N:C:A:J:N:.:';
preg_match_all('/(T:|G:|L:|D:).*?(G:|i:|X:|\.:)/', $Structure, $arr, PREG_SET_ORDER);
the results I get are
T:J:N:G: , T:N:N:C:J:N:C:A:J:N:.:
How can I modify the query so that the deliminator (G:|i:|X:|.:) of the match is not included in the find, but will bu used in the next search. In other words make the result look as bellow:
T:J:N: , G:T:N:N:C:J:N:C:A:J:N:
instead?
Is this possible?
Thanks
Yes, instead of making your 2nd capturing group consume the input, turn it into a positive lookahead:
/(T:|G:|L:|D:).*?(?=(?:G:|i:|X:|\.:))/
Now, instead of matching (and consuming) the delimiter, this:
(?=(?:G:|i:|X:|\.:))
States that the regex must assert that the delimiter is present from the current point forward, i.e. a positive lookahead.
This results in:
"T:J:N:, G:T:N:N:C:J:N:C:A:J:N:"
It is possible by lookaheads, with the following syntax:
(?=G:|i:|X:|\.:)
That will not consume the piece that matches the regex.
On a side note, the delimiter means the slashes that you have enclosing your regex and not the capturing group you have.

preg_match returning weird results

I am searching a string for urls...and my preg_match is giving me an incorrect amount of matches for my demo string.
String:
Hey there, come check out my site at www.example.com
Function:
preg_match("#(^|[\n ])([\w]+?://[\w]+[^ \"\n\r\t<]*)#ise", $string, $links);
echo count($links);
The result comes out as 3.
Can anybody help me solve this? I'm new to REGEX.
$links is the array of sub matches:
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
The matches of the two groups plus the match of the full regular expression results in three array items.
Maybe you rather want all matches using preg_match_all.
If you use preg_match_pattern, (as Gumbo suggested), please note that if you run your regex against this string, it will both match the value of your anchor attribute "href" as well as the linked Text which in this case happens to comtain an url. This makes TWO matches.
It would be wise to run an array_unique on your resultset :)
In addition to the advice on how to use preg_match, I believe there is something seriously wrong with the regular expression you are using. You may want to trying something like this instead:
preg_match("_([a-zA-Z]+://)?([0-9a-zA-Z$-\_.+!*'(),]+\.)?([0-9a-zA-Z]+)+\.([a-zA-Z]+)_", $string, $links);
This should handle most cases (although it wouldn't work if there was a query string after the top-level domain). In the future, when writing regular expressions, I recommend the following web-sites to help: http://www.regular-expressions.info/ and especially http://regexpal.com/ for testing them as you're writing them.

Categories