Regexp for string which shouldn't contain two known chars - php

For example
I have a string like "12345%67890"
Regexp [^%]* gives me 12345.
How to get the same result, if I need to use not "%", but "<%" for example.Thanks a lot.
A bit more information:
I have a huge text, where I make some replacements between %%, like %test% I change to something else using preg_match_all and preg_replace, but if % was used not like a separator, everything crashes. Ex: %test 90% test%, so I've decided to change % to something more complicated like <% test 90% test %>.

Based on your new information it sounds like you control the output, which makes this all kind of weird.
In any case, here's a regex that will capture the contents of the wrapper you've created:
<%(.+?)%>
Notice the ? for a lazy match.
Code sample:
$string = "asdfar <%test123%>farasr%<5 sara><%90% is cool%%><%ooooaaaah%>>>%<%>%%";
preg_match_all('/<%(.+?)%>/', $string, $matches);
var_dump($matches);
Output:
array(2) {
[0]=>
array(3) {
[0]=>
string(11) "<%test123%>"
[1]=>
string(16) "<%90% is cool%%>"
[2]=>
string(13) "<%ooooaaaah%>"
}
[1]=>
array(3) {
[0]=>
string(7) "test123"
[1]=>
string(12) "90% is cool%"
[2]=>
string(9) "ooooaaaah"
}
}

Seems to me you should be doing a split, not a match:
$subject = "12345<%67890";
$result = preg_split('/<%/', $subject);
print_r($result);
output:
Array
(
[0] => 12345
[1] => 67890
)

Related

How to get the array from string staring from # and end with space in the string

How do I get an array from the below string?
I need what is between # and space.
Gold star #RPXIeDIWVuTHFWGkaWbJEvv0KFk2 nice James #72oCu3zBCHQzS5fiY3KNFCWkgA53 for #DoxBay
I am using the explode php function
$output=explode("#",$title);
var_dump($output);
Sounds like you will want to use preg_match_all() to create an array of all matches from your string.
This will take your string and compare it against your pattern. It will place the the results you are looking for in the second element of the matches array.
Like so:
$str = 'Gold star #RPXIeDIWVuTHFWGkaWbJEvv0KFk2 nice James #72oCu3zBCHQzS5fiY3KNFCWkgA53 for #DoxBay ';
preg_match_all('/#(.*?)\s/', $str, $matches);
echo '<pre>';
print_r($matches[1]);
echo '</pre>';
This will output:
Array
(
[0] => RPXIeDIWVuTHFWGkaWbJEvv0KFk2
[1] => 72oCu3zBCHQzS5fiY3KNFCWkgA53
[2] => DoxBay
)
The easiest way is probably regex.
This captures what is between a # and a space.
Since the last word does not have a space after it we can either add a space to the string or change the pattern. I choose to add a space.
$str = "Gold star #RPXIeDIWVuTHFWGkaWbJEvv0KFk2 nice James #72oCu3zBCHQzS5fiY3KNFCWkgA53 for #DoxBay";
preg_match_all("/\#(.*?)\s/", $str . " ", $match);
var_dump($match);
Output:
array(2) {
[0]=>
array(3) {
[0]=>
string(30) "#RPXIeDIWVuTHFWGkaWbJEvv0KFk2 "
[1]=>
string(30) "#72oCu3zBCHQzS5fiY3KNFCWkgA53 "
[2]=>
string(8) "#DoxBay "
}
[1]=>
array(3) {
[0]=>
string(28) "RPXIeDIWVuTHFWGkaWbJEvv0KFk2"
[1]=>
string(28) "72oCu3zBCHQzS5fiY3KNFCWkgA53"
[2]=>
string(6) "DoxBay"
}
}
https://3v4l.org/RNM2P

PHP preg_match_all same line

Having trouble with a regular expression (they are not my strong suit). I'm trying to match all strings between {{ and }}, but if a set of brackets occurs on the same line, it counts that as a single match... Example:
$string = "
Hello, kind sir
{{SHOULD_MATCH1}} {{SHOULD_MATCH2}}
welcome to
{{SHOULD_MATCH3}}
";
preg_match_all("/{{(.*)}}/", $string, $matches);
var_dump($matches); // returns arrays with 2 results instead of 3
returns:
array(2) {
[0]=>
array(2) {
[0]=>
string(35) "{{SHOULD_MATCH1}} {{SHOULD_MATCH2}}"
[1]=>
string(17) "{{SHOULD_MATCH3}}"
}
[1]=>
array(2) {
[0]=>
string(31) "SHOULD_MATCH1}} {{SHOULD_MATCH2"
[1]=>
string(13) "SHOULD_MATCH3"
}
}
Any help? Thanks!
Replace the * quantifier with its non-greedy form *?.
This will make it match as little as possible while still allowing the expression to match as a whole, which is different from its current behavior of matching as much as possible.
You can use one the following patterns.
{{(.+?)}
{{([^}]+)
{{(\w+)
{{([[:digit:][:upper:]_]+)
{{([\p{Lu}\p{N}_]+)

Why does preg_match_all() create the same answer multiple times?

The following code extracts #hashtags from a tweet and puts them in the variable $matches.
$tweet = "this has a #hashtag a #badhash-tag and a #goodhash_tag";
preg_match_all("/(#\w+)/", $tweet, $matches);
var_dump( $matches );
Can someone please explain to me why the following results have 2 identical arrays instead of just 1?
array(2) {
[0]=>
array(3) {
[0]=>
string(8) "#hashtag"
[1]=>
string(8) "#badhash"
[2]=>
string(13) "#goodhash_tag"
}
[1]=>
array(3) {
[0]=>
string(8) "#hashtag"
[1]=>
string(8) "#badhash"
[2]=>
string(13) "#goodhash_tag"
}
}
Because you use () to catch the sub group.
Try:
preg_match_all("/#\w+/", $tweet, $matches);
Why are you using () unless you want it to do exactly that. lol Sorry, that came out not so friendly :(
http://php.net/manual/en/function.preg-match-all.php Example 3
its simple :
remove () from your expression
Hope it helps.

Regex quantified capture

php > preg_match("#/m(/[^/]+)+/t/?#", "/m/part/other-part/t", $m);
php > var_dump($m);
array(2) {
[0]=>
string(20) "/m/part/other-part/t"
[1]=>
string(11) "/other-part"
}
php > preg_match_all("#/m(/[^/]+)+/t/?#", "/m/part/other-part/t", $m);
php > var_dump($m);
array(2) {
[0]=>
array(1) {
[0]=>
string(20) "/m/part/other-part/t"
}
[1]=>
array(1) {
[0]=>
string(11) "/other-part"
}
}
With said example I would like the capture to match both /part and /other-part, unfortunately with regex /m(/[^/]+)+/t/? doesn't capture both, as I expect.
This capture should not be bound to only match this sample, it should capture an undefined number of repetitions of the capture group; e.g. /m/part/other-part/and-another/more/t
UPDATE:
Given that this is expected behavior my question stands as of how I would be able to achieve this matching of mine?
Try this one out:
preg_match_all("#(?:/m)?/([^/]+)(?:/t)?#", "/m/part/other-part/another-part/t", $m);
var_dump($m);
It gives:
array(2) {
[0]=>
array(3) {
[0]=>
string(7) "/m/part"
[1]=>
string(11) "/other-part"
[2]=>
string(15) "/another-part/t"
}
[1]=>
array(3) {
[0]=>
string(4) "part"
[1]=>
string(10) "other-part"
[2]=>
string(12) "another-part"
}
}
//EDIT
IMO the best way to do what you want is to use preg_match() from #stema and explode result by / to get list of parts you want.
Thats the way capturing groups are working. repeated capturing groups have only the last match stored after the regex finished. Thats in your test "/other-part".
Try this instead
/m((?:/[^/]+)+)/t/?
See it here on Regexr, while hovering over the match, you can see the content of the capturing group.
Just make your group non-capturing by adding a ?: at the start and put another one around the whole repetition.
In php
preg_match_all("#/m((?:/[^/]+)+)/t/?#", "/m/part/other-part/t", $m);
var_dump($m);
Output:
array(2) {
[0]=> array(1) {
[0]=>
string(20) "/m/part/other-part/t"
}
[1]=> array(1) {
[0]=>
string(16) "/part/other-part"
}
}
As already written in a comment, you can't do this at once because preg_match does not allow you to return the same subgroup matches as well (like you can do with Javascript or .Net, see Get repeated matches with preg_match_all()). So you can divide the operation onto multiple steps:
Match the subject, extract the part you're interested in.
Match the interested part only.
Code:
$subject = '/m/part/other-part/t';
$subpattern = '/[^/]+';
$pattern = sprintf('~/m(?<path>(?:%s)+)/t/?~', $subpattern);
$r = preg_match($pattern, $subject, $matches);
if (!$r) return;
$r = preg_match_all("~$subpattern~", $matches['path'], $matches);
var_dump($matches);
Output:
array(1) {
[0]=>
array(2) {
[0]=>
string(5) "/part"
[1]=>
string(11) "/other-part"
}
}

preg_match not returning expected results

I'm attempting to use regexp to parse a search string that from time to time may contain special syntax. The syntax im looking for is [special keyword : value] and i want each match put into an array. Keep in mind that the search string will contain other text that is not intended to be parsed.
$searchString = "[StartDate:2010-11-01][EndDate:2010-11-31]";
$specialKeywords = array();
preg_match("/\[{1}.+\:{1}.+\]{1}/", $searchString, $specialKeywords);
var_dump($specialKeywords);
Output:
array(1) { [0]=> string(43) "[StartDate:2010-11-01] [EndDate:2010-11-31]" }
Desired Output:
array(2) { [0]=> string() "[StartDate:2010-11-01]"
[1]=> string() "[EndDate:2010-11-01]"}
Please let me know if i am not being clear enough.
Your .+ matches across the boundaries between the two [...] parts because it matches any character, and as many of them as possible. You could be more restrictive about which characters may be matched. Also {1} is redundant and can be dropped.
/\[[^:]*:[^\]]*\]/
should work more reliably.
Explanation:
\[ # match a [
[^:]* # match any number of characters except :
: # match a :
[^\]]* # match any number of characters except ]
\] # match a ]
This:
$searchString = "[StartDate:2010-11-01][EndDate:2010-11-31]";
preg_match_all('/\[.*?\]/', $searchString, $match);
print_r($match);
gives the expected result, I'm not sure if it matches all the constraints.
Try the following:
$searchString = "[StartDate:2010-11-01][EndDate:2010-11-31]";
$specialKeywords = array();
preg_match_all("/\[\w+:\d{4}-\d\d-\d\d\]/i", $searchString, $specialKeywords);
var_dump($specialKeywords[0]);
Outputs:
array(2) {
[0]=>
string(22) "[StartDate:2010-11-01]"
[1]=>
string(20) "[EndDate:2010-11-31]"
}
Use this regex: "/\[(.*?)\:(.*?)\]{1}/" and also use preg_match_all, it will return
array(3) {
[0]=>
array(2) {
[0]=>
string(22) "[StartDate:2010-11-01]"
[1]=>
string(20) "[EndDate:2010-11-31]"
}
[1]=>
array(2) {
[0]=>
string(9) "StartDate"
[1]=>
string(7) "EndDate"
}
[2]=>
array(2) {
[0]=>
string(10) "2010-11-01"
[1]=>
string(10) "2010-11-31"
}
}
/\[.+?\:.+?\]/
I suggest this method, less complex but it handles the same as tim's

Categories