PHP - Regex for a string of special characters - php

Morning SO. I'm trying to determine whether or not a string contains a list of specific characters.
I know i should be using preg_match for this, but my regex knowledge is woeful and i have been unable to glean any information from other posts around this site. Since most of them just want to limit strings to a-z, A-Z and 0-9. But i do want some special characters to be allowed, for example: ! # £ and others not in the below string.
Characters to be matched on: # $ % ^ & * ( ) + = - [ ] \ ' ; , . / { } | \ " : < > ? ~
private function containsIllegalChars($string)
{
return preg_match([REGEX_STRING_HERE], $string);
}
I originally wrote the matching in Javascript, which just looped through each letter in the string and then looped through every character in another string until it found a match. Looking back, i can't believe i even attempted to use such an archaic method. With the advent of json (and a rewrite of the application!), i'm switching the match to php, to return an error message via json.
I was hoping a regex guru could assist with converting the above string to a regex string, but any feedback would be appreciated!

Regexp for a "list of disallowed character" is not mandatory.
You may have a look at strpbrk. It should do the job you need.
Here's an example of usage
$tests = array(
"Hello I should be allowed",
"Aw! I'm not allowed",
"Geez [another] one",
"=)",
"<WH4T4NXSS474K>"
);
$illegal = "#$%^&*()+=-[]';,./{}|:<>?~";
foreach ($tests as $test) {
echo $test;
echo ' => ';
echo (false === strpbrk($test, $illegal)) ? 'Allowed' : "Disallowed";
echo PHP_EOL;
}
http://codepad.org/yaJJsOpT

return preg_match('/[#$%^&*()+=\-\[\]\';,.\/{}|":<>?~\\\\]/', $string);

$pattern = preg_quote('#$%^&*()+=-[]\';,./{}|\":<>?~', '#');
var_dump(preg_match("#[{$pattern}]#", 'hello world')); // false
var_dump(preg_match("#[{$pattern}]#", 'he||o wor|d')); // true
var_dump(preg_match("#[{$pattern}]#", '$uper duper')); // true
Likely, you can cache the $pattern, depending on your implementation.
(Though looking outside of regular expressions, you're best of with strpbrk as mentioned here too)

I think what you're looking for can be greatly simplified by including the characters that you want to allow like so:
preg_match('/[^\w!#£]/', $string)
Here's a quick breakdown of what's happening:
[^] = not included
\w = letters and numbers
! # £ = the list of characters you would also like to allow

Related

Create a function to find a specific word in the title

I have the following title formation on my website:
It's no use going back to yesterday, because at that time I was... Lewis Carroll
Always is: The phrase… (author).
I want to delete everything after the ellipsis (…), leaving only the sentence as the title. I thought of creating a function in php that would take the parts of the titles, throw them in an array and then I would work each part, identifying the only pattern I have in the title, which is the ellipsis… and then delete everything. But when I do that, in the X space of my array, it returns the following:
was...
In position 8 of the array comes the word and the ellipsis and I don't know how to find a pattern to delete the author of the title, my pattern was the ellipsis. Any idea?
<?php
$a = get_the_title(155571);
$search = '... ';
if(preg_match("/{$search}/i", $a)) {
echo 'true';
}
?>
I tried with the code above and found the ellipsis, but I needed to bring it into an array to delete the part I need. I tried something like this:
<?php
define('WP_USE_THEMES', false);
require('./wp-blog-header.php');
global $wpdb;
$title_array = explode(' ', get_the_title(155571));
$search = '... ';
if (array_key_exists("/{$search}/i",$title_array)) {
echo "true";
}
?>
I started doing it this way, but it doesn't work, any ideas?
Thanks,
If you use regex you need to escape the string as preg_quote() would do, because a dot belongs to the pattern.
But in your simple case, I would not use a regex and just search for the three dots from the end of the string.
Note: When the elipsis come from the browser, there's no way to detect in PHP.
$title = 'The phrase... (author).';
echo getPlainTitle($title);
function getPlainTitle(string $title) {
$rpos = strrpos($title, '...');
return ($rpos === false) ? $title : substr($title, 0, $rpos);
}
will output
The phrase
First of all, since you're working with regular expressions, you need to remember that . has a special meaning there: it means "any character". So /... / just means "any three characters followed by a space", which isn't what you want. To match a literal . you need to escape it as \.
Secondly, rather than searching or splitting, you could achieve what you want by replacing part of the string. For instance, you could find everything after the ellipsis, and replace it with an empty string. To do that you want a pattern of "dot dot dot followed by anything", where "anything" is spelled .*, so \.\.\..*
$title = preg_replace('/\.\.\..*/', '', $title);

Erasing C comments with preg_replace

I need to erase all comments in $string which contains data from some C file.
The thing I need to replace looks like this:
something before that shouldnt be replaced
/*
* some text in between with / or * on many lines
*/
something after that shouldnt be replaced
and the result should look like this:
something before that shouldnt be replaced
something after that shouldnt be replaced
I have tried many regular expressions but neither work the way I need.
Here are some latest ones:
$string = preg_replace("/\/\*(.*?)\*\//u", "", $string);
and
$string = preg_replace("/\/\*[^\*\/]*\*\//u", "", $string);
Note: the text is in UTF-8, the string can contain multibyte characters.
You would also want to add the s modifier to tell the regex that .* should include newlines. I always think of s to mean "treat the input text as a single line"
So something like this should work:
$string = preg_replace("/\\/\\*(.*?)\\*\\//us", "", $string);
Example: http://codepad.viper-7.com/XVo9Tp
Edit: Added extra escape slashes to the regex as Brandin suggested because he is right.
I don't think regexp fit good here. What about wrote a very small parse to remove this? I don't do PHP coding for a long time. So, I will try to just give you the idea (simple alogorithm) I haven't tested this, it's just to you get the idea, as I said:
buf = new String() // hold the source code without comments
pos = 0
while(string[pos] != EOF) {
if(string[pos] == '/') {
pos++;
while(string[pos] != EOF)
{
if(string[pos] == '*' && string[pos + 1] == '/') {
pos++;
break;
}
pos++;
}
}
buf[buf_index++] = string[pos++];
}
where:
string is the C source code
buf a dynamic allocated string which expands as needed
It is very hard to do this perfectly without ending up writing a full C parser.
Consider the following, for example:
// Not using /*-style comment here.
// This line has an odd number of " characters.
while (1) {
printf("Wheee!
(*\/*)
\\// - I'm an ant!
");
/* This is a multiline comment with a // in, and
// an odd number of " characters. */
}
So, from the above, we can see that our problems include:
multiline quote sequences should be ignored within doublequotes. Unless those doublequotes are part of a comment.
single-line comment sequences can be contained in double-quoted strings, and in multiline strings.
Here's one possibility to address some of those issues, but far from perfect.
// Remove "-strings, //-comments and /*block-comments*/, then restore "-strings.
// Based on regex by mauke of Efnet's #regex.
$file = preg_replace('{("[^"]*")|//[^\n]*|(/\*.*?\*/)}s', '\1', $file);
try this:
$string = preg_replace("#\/\*\n?(.*)\*\/\n?#ms", "", $string);
Use # as regexp boundaries; change that u modifier with the right ones: m (PCRE_MULTILINE) and s (PCRE_DOTALL).
Reference: http://php.net/manual/en/reference.pcre.pattern.modifiers.php
It is important to note that my regexp does not find more than one "comment block"... Use of "dot match all" is generally not a good idea.

Escaping # symbol in regular expression using preg_match

I am building an application that processes some transactions with descriptions such as:
cash bnkm aug09 : bt phone / a#19:10
I need to extract the time from such strings using a regular expression using preg_match.
I have written a regex that works on this website:
http://www.switchplane.com/awesome/preg-match-regular-expression-tester?pattern=%2F%40%5Cd%5Cd.%5Cd%5Cd%2F&subject=cash+bnkm++++aug09+%3A+bt+phone+%2F+a%4019%3A10
But not in my code, as a, Warning: preg_match(): Unknown modifier '#' error is given.
The regex I am using is: /#\d\d.\d\d/ and the call to preg_match is as below:
$matches = array();
if (preg_match('/'.'/#\d\d.\d\d/'.'/', 'cash bnkm aug09 : bt phone / a#19:10', $matches) === 1) {
//Carry out trigger
echo 'yes';
} else {
echo 'no';
}
Essentially it seems to choke on the # symbol - which I can't seem to escape.
I am quite new to regular expressions but can't find any description of # being used for as a special character so would have assumed it would be used to test the string rather than alter how the string was tested.
Any help on how to fix this would be appreciated.
try this:
if (preg_match('/#\d\d:\d\d/', 'cash bnkm aug09 : bt phone / a#19:10', $matches) === 1) {
//Carry out trigger
echo 'yes';
print_r($matches);//Array ( [0] => #19:10 )
} else {
echo 'no';
}
in your pattern there are two extra slashes:
'/'.'/#\d\d.\d\d/'.'/'
^-- ^--
If # not used as delimiter, then no need to escape it
preg_match("/\#\d\d:\d\d/ims", "<input string>", $matches)
Use \ to escape special characters, the # is a delimiter.

PHP Formatting Regex - BBCode

To be honest, I suck at regex so much, I would use RegexBuddy, but I'm working on my Mac and sometimes it doesn't help much (for me).
Well, for what I need to do is a function in php
function replaceTags($n)
{
$n = str_replace("[[", "<b>", $n);
$n = str_replace("]]", "</b>", $n);
}
Although this is a bad example in case someone didn't close the tag by using ]] or [[, anyway, could you help with regex of:
[[ ]] = Bold format
** ** = Italic format
(( )) = h2 heading
Those are all I need, thanks :)
P.S - Is there any software like RegexBuddy available for Mac (Snow Leopard)?
function replaceTags($n)
{
$n = preg_replace("/\[\[(.*?)\]\]/", "<strong>$1</strong>", $n);
$n = preg_replace("/\*\*(.*?)\*\*/", "<em>$1</em>", $n);
$n = preg_replace("/\(\((.*?)\)\)/", "<h2>$1</h2>", $n);
return $n;
}
I should probably provide a little explanation: Each special character is preceded by a backslash so it's not treated as regex instructions ("[", "(", etc.). The "(.*?)" captures all characters between your delimiters ("[[" and "]]", etc.). What's captured is then output in the replacements string in place of "$1".
The same reason you can't do this with str_replace() applies to preg_replace() as well. Tag-pair style parsing requires a lexer/parser if you want to yield 100% accuracy and cover for input errors.
Regular expressions can't handle unclosed tags, nested tags, that sort of thing.
That all being said, you can get 50% of the way there with very little effort.
$test = "this is [[some]] test [[content for **you** to try, ((does [[it]])) **work?";
echo convertTags( $test );
// only handles validly formatted, non-nested input
function convertTags( $content )
{
return preg_replace(
array(
"/\[\[(.*?)\]\]/"
, "/\*\*(.*?)\*\*/"
, "/\(\((.*?)\)\)/"
)
, array(
"<strong>$1</strong>"
, "<em>$1</em>"
, "<h2>$1</h2>"
)
, $content
);
}
Modifiers could help too :)
http://lv.php.net/manual/en/reference.pcre.pattern.modifiers.php
U (PCRE_UNGREEDY) This modifier
inverts the "greediness" of the
quantifiers so that they are not
greedy by default, but become greedy
if followed by ?. It is not compatible
with Perl. It can also be set by a
(?U) modifier setting within the
pattern or by a question mark behind a
quantifier (e.g. .*?).

Regular Expressions: how to do "option split" replaces

those reqular expressions drive me crazy. I'm stuck with this one:
test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not
Task:
Remove all [[ and ]] and if there is an option split choose the later one so output should be:
test1:link test2:silver test3:out1insideout2 test4:this|not
I came up with (PHP)
$text = preg_replace("/\\[\\[|\\]\\]/",'',$text); // remove [[ or ]]
this works for part1 of the task. but before that I think I should do the option split, my best solution:
$text = preg_replace("/\\[\\[(.*\|)(.*?)\\]\\]/",'$2',$text);
Result:
test1:silver test3:[[out1[[inside]]out2]] this|not
I'm stuck. may someone with some free minutes help me? Thanks!
I think the easiest way to do this would be multiple passes. Use a regular expression like:
\[\[(?:[^\[\]]*\|)?([^\[\]]+)\]\]
This will replace option strings to give you the last option from the group. If you run it repeatedly until it no longer matches, you should get the right result (the first pass will replace [[out1[[inside]]out2]] with [[out1insideout2]] and the second will ditch the brackets.
Edit 1: By way of explanation,
\[\[ # Opening [[
(?: # A non-matching group (we don't want this bit)
[^\[\]] # Non-bracket characters
* # Zero or more of anything but [
\| # A literal '|' character representing the end of the discarded options
)? # This group is optional: if there is only one option, it won't be present
( # The group we're actually interested in ($1)
[^\[\]] # All the non-bracket characters
+ # Must be at least one
) # End of $1
\]\] # End of the grouping.
Edit 2: Changed expression to ignore ']' as well as '[' (it works a bit better like that).
Edit 3: There is no need to know the number of nested brackets as you can do something like:
$oldtext = "";
$newtext = $text;
while ($newtext != $oldtext)
{
$oldtext = $newtext;
$newtext = preg_replace(regexp,replace,$oldtext);
}
$text = $newtext;
Basically, this keeps running the regular expression replace until the output is the same as the input.
Note that I don't know PHP, so there are probably syntax errors in the above.
This is impossible to do in one regular expression since you want to keep content in multiple "hierarchies" of the content. It would be possible otherwise, using a recursive regular expression.
Anyways, here's the simplest, most greedy regular expression I can think of. It should only replace if the content matches your exact requirements.
You will need to escape all backslashes when putting it into a string (\ becomes \\.)
\[\[((?:[^][|]+|(?!\[\[|]])[^|])++\|?)*]]
As others have already explained, you use this with multiple passes. Keep looping while there are matches, performing replacement (only keeping match group 1.)
Difference from other regular expressions here is that it will allow you to have single brackets in the content, without breaking:
test1:[[link]] test2:[[gold|si[lv]er]]
test3:[[out1[[in[si]de]]out2]] test4:this|not
becomes
test1:[[link]] test2:si[lv]er
test3:out1in[si]deout2 test4:this|not
Why try to do it all in one go. Remove the [[]] first and then deal with options, do it in two lines of code.
When trying to get something going favour clarity and simplicity.
Seems like you have all the pieces.
Why not just simply remove any brackets that are left?
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not';
$str = preg_replace('/\\[\\[(?:[^|\\]]+\\|)+([^\\]]+)\\]\\]/', '$1', $str);
$str = str_replace(array('[', ']'), '', $str);
Well, I didn't stick to just regex, because I'm of a mind that trying to do stuff like this with one big regex leads you to the old joke about "Now you have two problems". However, give something like this a shot:
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not'; $reg = '/(.*?):(.*?)( |$)/';
preg_match_all($reg, $str, $m);
foreach($m[2] as $pos => $match) {
if (strpos($match, '|') !== FALSE && strpos($match, '[[') !== FALSE ) {
$opt = explode('|', $match); $match = $opt[count($opt)-1];
}
$m[2][$pos] = str_replace(array('[', ']'),'', $match );
}
foreach($m[1] as $k=>$v) $result[$k] = $v.':'.$m[2][$k];
This is C# using only using non-escaped strings, hence you will have to double the backslashes in other languages.
String input = "test1:[[link]] " +
"test2:[[gold|silver]] " +
"test3:[[out1[[inside]]out2]] " +
"test4:this|not";
String step1 = Regex.Replace(input, #"\[\[([^|]+)\|([^\]]+)\]\]", #"[[$2]]");
String step2 = Regex.Replace(step1, #"\[\[|\]\]", String.Empty);
// Prints "test1:silver test3:out1insideout2 test4:this|not"
Console.WriteLine(step2);
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not';
$s = preg_split("/\s+/",$str);
foreach ($s as $k=>$v){
$v = preg_replace("/\[\[|\]\]/","",$v);
$j = explode(":",$v);
$j[1]=preg_replace("/.*\|/","",$j[1]);
print implode(":",$j)."\n";
}

Categories