Don't match if string contains specific text - php

I have made this regex:
(?<=span class="ope">)?[a-z0-9]+?\.(pl|com|net\.pl|tk|org|org\.pl|eu)|$(?=<\/span>)$
It does match the strings like: example.pl, example12.com, something.eu but it will also match the dontwantthis.com.
My question is how to don't match a string in case if it contains the dontwantthis string?

You're probably following your regex with a loop to cycle through matches. In this case, it's probably easiest to just check for the presence of the dontwantthis substring and continue if it's there. Trying to implement it in regex is just asking for trouble.

It seems that you are extracting content from span elements using a regular expression. Now, despite all the reasons why this is not such a good idea...
... just keep the expression you have. Then, if you have a match, filter out the matched entries that should be rejected.
var $match = extractContentFromHtml($html); // use regex here, return false if no match
if ($match && validMatch($match)) {
// do something
}
where validMatch(string) should check if the value exists in some array, for example.

Related

Regular expression which matches a URL and return desired value

I need regular expression which matches a URL and return the desired value
Example (if the URL matches to)
1. http://example.com/amp
2. http://example.com/amp/
3. http://example.com/amp~
THEN
it should return: ?amp=1
ELSE
it should return: false
You should be able to use preg_replace to append ?amp= to the end of a matching string. Its functionality already does the if/else functional you require,
If matches are found, the new subject will be returned, otherwise subject will be returned unchanged or NULL if an error occurred.
(or I misread the it should return noting)
-http://php.net/manual/en/function.preg-replace.php
Something like
amp\K( |\/|~)$
Should do it
$string = 'http://example.com/amp~';
echo preg_replace('/amp\K( |\/|~)$/', '$1?amp=1', $string);
The $1 is optional, not sure if you wanted the found character included or not.
PHP Demo: https://eval.in/780432
Regex demo: https://regex101.com/r/JgcrLu/1/
$ is the end of the string. () is a capturing and alteration group. |s are alterations. \K skips the previously matched regex part.
You didn't specify the programming language you're using but you probably need something like:
php:
$new = preg_replace('%/amp\b(?:/|~)?%si', '/?amp=1', $old);
python:
new_string = re.sub(r"/amp\b(?:/|~)?", "/?amp=1", old_string, 0, re.IGNORECASE)
Regex Demo

How to preg_match '{95}1340{113}1488{116}1545{99}1364'

i want to preg_match following as it is
$this_string = '{95}1340{113}1488{116}1545{99}1364';
My best try was
preg_match('/^[\{\d+\}\d+]+$/', $this_string);
That matches
{95}1340{113}1488
but also
{95}1340{113}
which is wrong.
I know why it is matching last example. One match {95}1340 was true, so '+' 'll be always true. But i don't know how to tell, if it match, so it has always be a complete match in '[…]'
i do expect only matches likes these
{…}…
{…}…{…}…
{…}…{…}…{…}…
one of the tries:
^(\{\d+\}\d+)+$
does also match
{99}1364
at the very last end of this string as a second match, so i get back an Array with two Elements:
Array[0] = {95}1340{113}1488{116}1545{99}1364 and
Array[1] = {99}1364
Problem is unnecessary use of character class in your regex i.e. [ and ].
You can use:
'/^(\{\d+\}\d+)+$/'
The translation of your regex to a clearer thing would be: /^[\{\}0-9+]+$/, this would be explained as everything that is inside this chracters {}0123456789+, exactly those ones.
What you want is grouping, for grouping, parentheses are needed and not character classes () instead [], so what you want to do is replace [] for ().
Short answer: '/^(\{\d+\}\d+)+$/'
What you are trying to do is a little unclear. Since your last edit, I assume that you want to check the global format of the string and to extract all items (i.e. {121}1231) one by one. To do that you can use this code:
$str = '{95}1340{113}1488{116}1545{99}1364';
$pattern = '~\G(?:{\d+}\d+|\z)~';
if (preg_match_all($pattern, $str, $matches) && empty(array_pop($matches[0])))
print_r($matches[0]);
\G is an anchor for the start of the string or the end of the previous match
\z is an anchor for the end of the string
The alternation with \z is only needed to check that the last match is at the end of the string. If the last match is empty, you are sure that the format is correct until the end.

alternative to if(preg_match() and preg_match())

I want to know if we can replace if(preg_match('/boo/', $anything) and preg_match('/poo/', $anything))
with a regex..
$anything = 'I contain both boo and poo!!';
for example..
From what I understand of your question, you're looking for a way to check if BOTH 'poo' and 'boo' exist within a string using only one regex. I can't think of a more elegant way than this;
preg_match('/(boo.*poo)|(poo.*boo)/', $anything);
This is the only way I can think of to ensure both patterns exists within a string disregarding order. Of course, if you knew they were always supposed to be in the same order, that would make it more simple =]
EDIT
After reading through the post linked to by MisterJ in his answer, it would seem that a more simple regex could be;
preg_match('/(?=.*boo)(?=.*poo)/', $anything);
By using a pipe:
if(preg_match('/boo|poo/', $anything))
You can use the logical or as mentioned by #sroes:
if(preg_match('/(boo)|(poo)/,$anything)) the problem there is that you don't know which one matched.
In this one, you will match "I contain boo","I contain poo" and "I contain boo and poo".
If you want to only match "I contain boo and poo", the problem is really harder to figure out Regular Expressions: Is there an AND operator?
and it seems that you will have to stick with the php test.
To take conditions literally
if(preg_match('/[bp]oo.*[bp]oo/', $anything))
You can achieve this by altering your regular expression, as others have pointed out in other answers. However, if you want to use an array instead, so you do not have to list a long regex pattern, then use something like this:
// Default matches to false
$matches = false;
// Set the pattern array
$pattern_array = array('boo','poo');
// Loop through the patterns to match
foreach($pattern_array as $pattern){
// Test if the string is matched
if(preg_match('/'.$pattern.'/', $anything)){
// Set matches to true
$matches = true;
}
}
// Proceed if matches is true
if($matches){
// Do your stuff here
}
Alternatively, if you are only trying to match strings then it would be much more efficient if you were to use strpos like so:
// Default matches to false
$matches = false;
// Set the strings to match
$strings_to_match = array('boo','poo');
foreach($strings_to_match as $string){
if(strpos($anything, $string) !== false)){
// Set matches to true
$matches = true;
}
}
Try to avoid regular expressions where possible as they are a lot less efficient!

need help creating a non matching regex pattern

i'm trying to create a regex pattern that returns true if a certain word is not found. Ive tried using [^word] but that doesn't match up against a word just the individual characters as they appear.
I need preg_match(using php) to return true cause there are other words that I to match and return true.
if you are looking for a string within a string (no pattern needed) then use strstr() or stristr()
if (!preg_match("/word/", $string)) { // <-- true, if not match
// code here
}

preg_match returning weird results

I am searching a string for urls...and my preg_match is giving me an incorrect amount of matches for my demo string.
String:
Hey there, come check out my site at www.example.com
Function:
preg_match("#(^|[\n ])([\w]+?://[\w]+[^ \"\n\r\t<]*)#ise", $string, $links);
echo count($links);
The result comes out as 3.
Can anybody help me solve this? I'm new to REGEX.
$links is the array of sub matches:
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
The matches of the two groups plus the match of the full regular expression results in three array items.
Maybe you rather want all matches using preg_match_all.
If you use preg_match_pattern, (as Gumbo suggested), please note that if you run your regex against this string, it will both match the value of your anchor attribute "href" as well as the linked Text which in this case happens to comtain an url. This makes TWO matches.
It would be wise to run an array_unique on your resultset :)
In addition to the advice on how to use preg_match, I believe there is something seriously wrong with the regular expression you are using. You may want to trying something like this instead:
preg_match("_([a-zA-Z]+://)?([0-9a-zA-Z$-\_.+!*'(),]+\.)?([0-9a-zA-Z]+)+\.([a-zA-Z]+)_", $string, $links);
This should handle most cases (although it wouldn't work if there was a query string after the top-level domain). In the future, when writing regular expressions, I recommend the following web-sites to help: http://www.regular-expressions.info/ and especially http://regexpal.com/ for testing them as you're writing them.

Categories