Ok,so I have this form that contains textareas and I want to verify they dont contain any illegal characters.
Html:
<textarea minlength="100" required name="Description" maxlength="800">
</textarea>
Php:
if(!preg_match("/^[-\p{L}\p{N} #&()!*,.;'\/\\\\]+$/u",$_POST["Description"])){
//error
}
I have tried multiple completely legal texts but it returns false.
What am I missing?
I guess your expression works fine, you might want to remove the u flag:
if(!preg_match("/^[-\p{L}\p{N} #&()!*,.;'\/\\\\]+$/s",$_POST["Description"])){
//error
}
Or, you might be trying to do,
if(!preg_match("/^[^-\p{L}\p{N} #&()!*,.;'\/\\\\]+$/s",$_POST["Description"])){
//error
}
if you want to exclude things out.
Demo 2
Test
$re = '/^[-\p{L}\p{N} #&()!*,.;\'\/\\\\]+$/m';
$str = 'abcd
abcd\\\\\\\\\\\\
&*&??
abc?
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
Output
array(2) {
[0]=>
array(1) {
[0]=>
string(4) "abcd"
}
[1]=>
array(1) {
[0]=>
string(10) "abcd\\\\\\"
}
}
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
Related
I am new to RegEx. I am parsing a HTML page and because it is buggy I cannot use a XML or HTML parser. So I am using a regular expression.
My code looks like this:
$html = '<html><div data-id="ABC012" data-index="123" ...';
preg_match_all('/<div data-id="[A-Z\\d]+" data-index="\\d+"/', $html, $result);
var_dump($result);
The output looks good so the code is working. Now I want to extract the matched values. I did it exactly as described in this answer and now the code looks like this:
$html = '<html><div data-id="ABC012" data-index="123" ...';
preg_match_all('/<div data-id="#([A-Z\\d]+)" data-index="#(\\d+)"/', $html, $result);
var_dump($result);
But it outputs an empty array. What is wrong? Please don't improve the pattern by adding the closing '>' or making it robust against white spaces. I just need to get the code running.
You could write the code and the pattern like this, using a single backslash to match digits \d and omit the # in the pattern as that is not in the example data:
$html = '<html><div data-id="ABC012" data-index="123" ...';
preg_match_all('/<div data-id="([A-Z\d]+)" data-index="(\d+)"/', $html, $result);
var_dump($result);
Output
array(3) {
[0]=>
array(1) {
[0]=>
string(38) "<div data-id="ABC012" data-index="123""
}
[1]=>
array(1) {
[0]=>
string(6) "ABC012"
}
[2]=>
array(1) {
[0]=>
string(3) "123"
}
}
I am trying to find a solution on how to remove everything outside specific value in brackets, including the value in brackets.
This is what I mean. I have this string
$str = "[:de]Some german text[:en]Some English text[:]";
What I want to achieve to get the text between [:de]and[:en] and to remove everything else, so the result has to be
$str = "Some german text";
I guess it should be some preg_match or some regex solution, but all I found was, how to remove the text in between, but not to keep the text in between and remove everything else.
Any ideas are welcome.
I guess,
(?<=\[:de\]).*?(?=\[:en\])
might work OK here.
Test
$re = '/(?<=\[:de\]).*?(?=\[:en\])/s';
$str = '[:de]Some german text 1[:en]Some English text[:] [:de]Some german text 2[:en]Some English text[:]
[:de]Some german text 3[:en]Some English text[:]';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
Output
array(3) {
[0]=>
array(1) {
[0]=>
string(18) "Some german text 1"
}
[1]=>
array(1) {
[0]=>
string(18) "Some german text 2"
}
[2]=>
array(1) {
[0]=>
string(18) "Some german text 3"
}
}
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
Thanks to Emma's answer this is what I came up as a solution.
$re = '/(?<=\[:de\]).*?(?=\[:en\])/s';
$result = preg_match($re, $str, $match);
$result = $match[0];
The zeros can be incremented but it must be of four digits, so it could be CEC0152-2005
Of course with a "-" between them.
I used www.txt2re.com to generate this patern but it didn't help me.
Maybe,
^[A-Z]{3}[0-9]{4}-[0-9]{4}$
or,
^CEC[0-9]{4}-[0-9]{4}$
might work fine.
Test
$re = '/^[A-Z]{3}[0-9]{4}-[0-9]{4}$/m';
$str = 'CEC0152-2005
CEC0152-2019
CEC0152-1999
CEC0152-19991';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
Output
array(3) {
[0]=>
array(1) {
[0]=>
string(12) "CEC0152-2005"
}
[1]=>
array(1) {
[0]=>
string(12) "CEC0152-2019"
}
[2]=>
array(1) {
[0]=>
string(12) "CEC0152-1999"
}
}
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
If after the dash we'd have a four-digit year,
^[A-Z]{3}[0-9]{4}-[12][0-9]{3}$
^CEC[0-9]{4}-[12][0-9]{3}$
might also work fine, I guess.
Demo 2
my HTML form code replaces some words with <-#word#-> using the code
$string = preg_replace("/($p)/i", '<-#$1#->', $string);
the problem is that if the form has some errors, upon resubmitting the form the word becomes <-#<-#<-#word#->#->#-> every time someone resubmits the form. Is it possible to replace but if it is already replaced then do not.
This is what I tried using NOT operator but it is not working
$string = preg_replace("/^(<-#)($p)^(#->)/i", '<-#$1#->', $string);
You could use a negative lookarounds to assert what is directly on the left an on the right is not <-# and
(?<!<-#)(word)(?!#->)
Regex demo | Php demo
Your code could look like:
$string = preg_replace("/(?<!<-#)($p)(?!#->)/i", '<-#$1#->', $string);
Another method might be to check with preg_match_all() to ensure if your matches are returning:
$string = '<-#<-#<-#Any alphanumeric input that user may wish#->#->#->';
preg_match_all("/(<-#)+([A-Za-z0-9_\s]+)(#->)+/s", $string, $matches);
$string = '<-#' . $matches[2][0] . '#->';
var_dump($string);
which outputs:
string(47) "<-#Any alphanumeric input that user may wish#->"
var_dump($matches); would return:
array(4) {
[0]=>
array(1) {
[0]=>
string(59) "<-#<-#<-#Any alphanumeric input that user may wish#->#->#->"
}
[1]=>
array(1) {
[0]=>
string(3) "<-#"
}
[2]=>
array(1) {
[0]=>
string(41) "Any alphanumeric input that user may wish"
}
[3]=>
array(1) {
[0]=>
string(3) "#->"
}
}
Code:
$pattern = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
$urls = array();
preg_match($pattern, $comment, $urls);
return $urls;
According to an online regex tester, this regex is correct and should be working:
http://regexr.com?35nf9
I am outputting the $links array using:
$linkItems = $model->getLinksInComment($model->comments);
//die(print_r($linkItems));
echo '<ul>';
foreach($linkItems as $link) {
echo '<li>'.$link.'</li>';
}
echo '</ul>';
The output looks like the following:
http://google.com
http
The $model->comments looks like the following:
destined for surplus
RT#83015
RT#83617
http://google.com
https://google.com
non-link
The list generated is only suppose to be links, and there should be no lines that are empty. Is there something wrong with what I did, because the Regex seems to be correct.
If I'm understanding right, you should use preg_match_all in your getLinksInComment function instead:
preg_match_all($pattern, $comment, $matches);
if (isset($matches[0])) {
return $matches[0];
}
return array(); #in case there are no matches
preg_match_all gets all matches in a string (even if the string contains newlines) and puts them into the array you supply as the third argument. However, anything matched by your regex's capture groups (e.g. (http|https|ftp|ftps)) will also be put into your $matches array (as $matches[1] and so on). That's why you want to return just $matches[0] as your final array of matches.
I just ran this exact code:
$line = "destined for surplus\n
RT#83015\n
RT#83617\n
http://google.com\n
https://google.com\n
non-link";
$pattern = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
preg_match_all($pattern, $line, $matches);
var_dump($matches);
and got this for my output:
array(3) {
[0]=>
array(2) {
[0]=>
string(17) "http://google.com"
[1]=>
string(18) "https://google.com"
}
[1]=>
array(2) {
[0]=>
string(4) "http"
[1]=>
string(5) "https"
}
[2]=>
array(2) {
[0]=>
string(0) ""
[1]=>
string(0) ""
}
}
Your comment is structured as multiple lines, some of which contain the URLs in which you're interested and nothing else. This being the case, you need not use anything remotely resembling that disaster of a regex to try to pick URLs out of the full comment text; you can instead split by newline, and examine each line individually to see whether it contains a URL. You might therefore implement a much more reliable getLinksInComment() thus:
function getLinksInComment($comment) {
$links = array();
foreach (preg_split('/\r?\n/', $comment) as $line) {
if (!preg_match('/^http/', $line)) { continue; };
array_push($links, $line);
};
return $links;
};
With suitable adjustment to serve as an object method instead of a bare function, this should solve your problem entirely and free you to go about your day.