PHP: Deal with delimiters in dynamic regular expression - php

Let's say I want to create a PHP tool to dynamically check a string against regular expression pattern. There is one problem with that: delimiters.
I would like to be able to do the following (simplified example):
$pattern = $_POST['pattern'];
$matched = (preg_match($pattern, $_POST['string']) === 1);
I don't want the users to put delimiters in the input, just a pure pattern, like ^a[bc]+d. How to deal with delimiters? I could do this:
$pattern = '/' . $_POST['pattern'] . '/';
Or with any other possible delimiter, but what about escaping? Is placing \ before each character in the pattern, being the same one as the delimiter of my choice, enough? Like this:
$pattern = '/' . str_replace('/', '\\/', $_POST['pattern']) . '/';
What is a neat way to deal with delimiters?

You have to check the input to identify the delimiter if there is any and remove it. This way if the user follows the rules, you don't have to worry, but if they don't the delimiter is removed anyway. The delimiter can be identified by comparing the first and last character.
// with incorrect input.
$input = "/^a[bc]+d/"; // from $_POST['pattern']
$delim = "/";
if ($input[0] === $input[strlen($input) - 1]) {
$delim = $input[0];
}
$sInput = str_replace($delim,"",$input);
echo $sInput; // ^a[bc]+d
With correct input, you don't have to worry.
$input = "^a[bc]+d"; // from $_POST['pattern']
$delim = "/";
if ($input[0] === $input[strlen($input) - 1]) {
$delim = $input[0];
}
$sInput = str_replace($delim,"",$input);
echo $sInput; // ^a[bc]+d
$sInput is your sanitized pattern. You can use it directly to test your string.
$matched = (preg_match($sInput, $_POST['string']) === 1);

Related

PHP preg_replace all text changing

I want to make some changes to the html but I have to follow certain rules.
I have a source code like this;
A beautiful sentence http://www.google.com/test, You can reach here http://www.google.com/test-mi or http://www.google.com/test/aliveli
I need to convert this into the following;
A beautiful sentence http://test.google.com/, You can reach here http://www.google.com/test-mi or http://test.google.com/aliveli
I tried using str_replace;
$html = str_replace('://www.google.com/test','://test.google.com');
When I use it like this, I get an incorrect result like;
A beautiful sentence http://test.google.com/, You can reach here http://test.google.com/-mi or http://test.google.com/aliveli
Wrong replace: http://test.google.com/-mi
How can I do this with preg_replace?
With regex you can use a word boundary and a lookahead to prevent replacing at -
$pattern = '~://www\.google\.com/test\b(?!-)~';
$html = preg_replace($pattern, "://test.google.com", $html);
Here is a regex demo at regex101 and a php demo at eval.in
Be aware, that you need to escape certain characters by a backslash from it's special meaning to match them literally when using regex.
It seems you're replacing the subdirectory test to subdomain. Your case seems to be too complicated. But I've given my best to apply some logic which may be reliable or may not be unless your string stays with the same structure. But you can give a try with this code:
$html = "A beautiful sentence http://www.google.com/test, You can reach here http://www.google.com/test-mi or http://www.google.com/test/aliveli";
function set_subdomain_string($html, $subdomain_word) {
$html = explode(' ', $html);
foreach($html as &$value) {
$parse_html = parse_url($value);
if(count($parse_html) > 1) {
$path = preg_replace('/[^0-9a-zA-Z\/-_]/', '', $parse_html['path']);
preg_match('/[^0-9a-zA-Z\/-_]/', $parse_html['path'], $match);
if(preg_match_all('/(test$|test\/)/', $path)) {
$path = preg_replace('/(test$|test\/)/', '', $path);
$host = preg_replace('/www/', 'test', $parse_html['host']);
$parse_html['host'] = $host;
if(!empty($match)) {
$parse_html['path'] = $path . $match[0];
} else {
$parse_html['path'] = $path;
}
unset($parse_html['scheme']);
$url_string = "http://" . implode('', $parse_html);
$value = $url_string;
}
}
unset($value);
}
$html = implode(' ', $html);
return $html;
}
echo "<p>{$html}</p>";
$modified_html = set_subdomain_string($html, 'test');
echo "<p>{$modified_html}</p>";
Hope it helps.
If the sentence is the only case in your problem you don't need to start struggling with preg_replace.
Just change your str_replace() functioin call to the following(with the ',' at the end of search string section):
$html = str_replace('://www.google.com/test,','://test.google.com/,');
This matches the first occurance of desired search parameter, and for the last one in your target sentence, add this(Note the '/' at the end):
$html = str_replace('://www.google.com/test/','://test.google.com/');
update:
Use these two:
$targetStr = preg_replace("/:\/\/www.google.com\/test[\s\/]/", "://test.google.com/", $targetStr);
It will match against all but the ones with comma at the end. For those, use you sould use the following:
$targetStr = preg_replace("/:\/\/www.google.com\/test,/", "://test.google.com/,", $targetStr);

PHP preg_replace "unknown modifier" [duplicate]

This question already has answers here:
Unknown modifier '/' in ...? what is it? [duplicate]
(4 answers)
Closed 9 years ago.
I'm trying to use an array of regular expressions to find and replace within a string in PHP, however I'm getting the error unknown modifier. I'm aware this appears to be a popular issue, however I don't understand how to fix it in my scenario.
Here is my original regex pattern:
{youtube((?!}).)*}
I run the following code against it to escape any characters:
$pattern = '/' . preg_quote($pattern) . '/';
That returns the following:
/\{youtube\(\(\?\!\}\)\.\)\*\}/
However, when I run this pattern through preg_replace I get the following error:
Warning: preg_replace() [function.preg-replace]: Unknown modifier 'y' ...
Any idea what needs to be changed, and at what stage of the code I've show here?
Many thanks
Edit 1
As requested, here is the code I'm using:
$content = "{youtube}omg{/youtube}";
$find = array();
$replace = array();
$find[] = '{youtube((?!}).)*}';
$replace[] = '[embed]http://www.youtube.com/watch?v=';
$find[] = '{/youtube((?!}).)*}';
$replace[] = '[/embed]';
foreach ( $find as $key => $value ) {
$find[$key] = '/' . preg_quote($value) . '/';
}
echo preg_replace($find, $replace, $content);
Here's a live example
You should pass delimiter as second parameter for preg_quote like this:
$find[$key] = '/' . preg_quote ($value, '/') . '/';
Otherwise, delimiter will not be quoted and thus will cause problems.
Simply change your Regex delimiter to something that's not used in the pattern, in this example I used # which works fine.
preg_quote only escapes . \ + * ? [ ^ ] $ ( ) { } = ! < > | : -, so when using a non-escaped character in your pattern, but also as your regex delimiter, it's not going to work as expected. Either change the delimiter as above, or pass it into preg_quote explicitely as part of the preg_quote($str, $delimiter) overload.
$content = "{youtube}omg{/youtube}";
$find = array();
$replace = array();
$find[] = '{youtube((?!}).)*}';
$replace[] = '[embed]http://www.youtube.com/watch?v=';
$find[] = '{/youtube((?!}).)*}';
$replace[] = '[/embed]';
foreach ( $find as $key => $value ) {
$find[$key] = '#' . preg_quote($value) . '#';
}
echo preg_replace($find, $replace, $content);
I may be sat in a hospital waiting room away from a computer, but what you're doing seems to have way over complicated the problem.
If I am to understand this correctly, you want to replace some like this:
{youtube something="maybe"}http://...{/youtube}
With:
[embed]http://...[/embed]
No?
If that's the case the solution is as simple as something along the lines of:
preg_replace('#{(/?)youtube[^}]*}#', '[\1embed]', $content);
The important considerations being the preservation of the open/closed-ness of the tags, and wrapping the regex in something that doesn't conflict quite so much with your target string, in this case, hashes.

Parsing string - with regex or something similar?

I'm writing routing class and need help. I need to parse $controller variable and assign parts of that string to another variables. Here is examples of $controller:
$controller = "admin/package/AdminClass::display"
//$path = "admin/package";
//$class = "AdminClass";
//$method = "display";
$controller = "AdminClass::display";
//$path = "";
//$class = "AdminClass";
//$method = "display";
$controller = "display"
//$path = "";
//$class = "";
//$method = "display";
This three situations is all i need. Yes, i can write long procedure to handle this situations, but what i need is simple solution with regex, with function preg_match_all
Any suggestion how to do this?
The following regex should accomplish this for you, you can then save the captured groups to $path, $class, and $method.
(?:(.+)/)?(?:(.+)::)?(.+)
Here is a Rubular:
http://www.rubular.com/r/1vPIhwPUub
Your php code might look something like this:
$regex = '/(?:(.+)\/)?(?:(.+)::)?(.+)/';
preg_match($regex, $controller, $matches);
$path = $matches[1];
$class = $matches[2];
$method = $matches[3];
This supposes that paths within the class, and the method name, can only contain letters.
The full regex is the following:
^(?:((?:[a-zA-Z]+/)*)([a-zA-Z]+)::)?([a-zA-Z]+)$
Two non capturing groups: the first one which makes all the path and class optional, the second which avoids the capture of individual path elements.
Explanation:
a path element is one or more letters followed by a /: [a-zA-Z]+/;
there may be zero or more of them: we must apply the * quantifier to the above; but the regex is not an atom, we therefore need a group. As we do not want to capture individual path elements, we use a non capturing group: (?:[a-zA-Z]+/)*;
we want to capture the full path if it is there, we must use a capturing group over this ((?:[a-zA-Z]+/)*);
the method name is one or more letters, and we want to capture it: ([a-zA-Z]+);
if present, it follows the path, and is followed by two semicolons: ((?:[a-zA-Z]+/)*)([a-zA-Z]+)::;
but all this is optional: we must therefore put a group around all this, which again we do not want to capture: (?:((?:[a-zA-Z]+/)*)([a-zA-Z]+)::)?;
finally, it is followed by a method name, which is NOT optional this time, and which we want to capture: (?:((?:[a-zA-Z]+/)*)([a-zA-Z]+)::)?([a-zA-Z]+);
and we want this to match the whole line: we need to anchor it both at the beginning and at the end, which gives the final result: ^(?:((?:[a-zA-Z]+/)*)([a-zA-Z]+)::)?([a-zA-Z]+)$
Phew.
$pieces = explode('/',$controller);
$path = '';
for($i = $i<$pieces.length-1; $i++)
{
if($i != 0)
$path+='/';
$path += $pieces[$i];
}
$p2 = explode( '::',$pieces[$pieces.length-1]);
$class = $p2[0];
$method = $p2[1];

matching url to another url

I am faced with rather unusual situation
I will have url in any of the 3 formats:
http://example.com/?p=12
http://example.com/a-b/
http://example.com/a.html
Now, I need to match with a url like
http://example.com/?p=12&t=1
http://example.com/a-b/?t=1
http://example.com/a.html?t=1
How can I achieve this? Please help
I know I can use like:
stristr('http://example.com/?p=12','http://example.com/?p=12&t=1')
but this will also match when
http://example.com/?p=123 (as it matches p=12)
Help guys, please.
A simple way to accomplish this would be to use PHP's parse_url() and parse_str().
http://www.php.net/manual/en/function.parse-url.php
http://www.php.net/manual/en/function.parse-str.php
Take your urls and run them through parse_url(), and take the resulting $result['query']. Run these through parse_str() and you'll end up with two associative arrays of the variable names and their values.
Basically, you'll want to return true if the $result['path']s match, and if any keys which are in both $result['query'] contain the same values.
code example:
function urlMatch($url1, $url2)
{
// parse the urls
$r1 = parse_url($url1);
$r2 = parse_url($url2);
// get the variables out of the queries
parse_str($r1['query'], $v1);
parse_str($r2['query'], $v2);
// match the domains and paths
if ($r1['host'] != $r2['host'] || $r1['path'] != $r2['path'])
return false;
// match the arrays
foreach ($v1 as $key => $value)
if (array_key_exists($key, $v2) && $value != $v2[$key])
return false;
// if we haven't returned already, then the queries match
return true;
}
A very quick (and somewhat dirty) way to achieve this is via the following regex:
$regex = '#^' . preg_quote($url, '#') . '[?&$]#';
Where $url is the URL you need to search for. In the above, we look for the URL in the beginning of whatever the regex is matched upon, followed by either a ?, a & or the end-of-line anchor. This is not bullet-proof but may be sufficient (#Mala already posted the "right" approach).
Below, I've posted an example of use (and the result):
$urls = array(
'http://example.com/?p=12',
'http://example.com/a-b/',
'http://example.com/a.html'
);
$tests = array(
'http://example.com/?p=12&t=1',
'http://example.com/a-b/?t=1',
'http://example.com/a.html?t=1',
'http://example.com/?p=123'
);
foreach ($urls as $url) {
$regex = '#^' . preg_quote($url, '#') . '[?&$]#';
print $url . ' - ' . $regex . "\n";
foreach ($tests as $test) {
$match = preg_match($regex, $test);
print ' ' . ($match ? '+' : '-') . ' ' . $test . "\n";
}
}
Result:
http://example.com/?p=12 - #^http\://example\.com/\?p\=12[?&$]#
+ http://example.com/?p=12&t=1
- http://example.com/a-b/?t=1
- http://example.com/a.html?t=1
- http://example.com/?p=123
http://example.com/a-b/ - #^http\://example\.com/a-b/[?&$]#
- http://example.com/?p=12&t=1
+ http://example.com/a-b/?t=1
- http://example.com/a.html?t=1
- http://example.com/?p=123
http://example.com/a.html - #^http\://example\.com/a\.html[?&$]#
- http://example.com/?p=12&t=1
- http://example.com/a-b/?t=1
+ http://example.com/a.html?t=1
- http://example.com/?p=123

Unable to find tokens in string

I am trying to write a small php
application and i am facing a problem.
This is suppose to get text like :
this is *noun but it is *name.
And
should take the words that start with
a star and add them to the string
tokens. However this is not working.
// get list of fields (each should have words delimited by underscores
$storyArray = split(' ', $story);
$tokens = ""; // space-delimited list of fields
for ($i = 0; $i < count($storyArray); $i++) {
if ($storyArray[$i][0] == '*')
$tokens .= $storyArray[$i] + " ";
}
$tokensArray = split(' ', $tokens);
Wow, I can't believe I've been debugging this and missing the obvious fault!
This line here:
$tokens .= $storyArray[$i] + " ";
You must concatenate with a period (.), not a plus sign! What you have right now is basically the same as $tokens .= 0;
This worked for me:
$story = "this is *noun but it is *name";
$storyArray = split(' ', $story);
$tokens = array();
for ($i = 0; $i < count($storyArray); $i++) {
if ($storyArray[$i][0] == '*') {
array_push($tokens, substr($storyArray[$i], 1));
}
}
var_dump($tokens);
$tokenString = implode(" ", $tokens);
Note that I'm pushing the tokens directly into an array, then imploding it.
"+" is for addition, not string concatenation. It casts its arguments as numbers, which will always be 0 in your source.
On another note, splitting $tokens is unnecessary. Instead, append tokens to $tokensArray:
$story = "this is *noun but it is *name";
// get list of fields (each should have words delimited by underscores
$storyArray = split(' ', $story);
$tokens = ""; // space-delimited list of fields
$tokensArray=array();
for ($i = 0; $i < count($storyArray); $i++) {
if ($storyArray[$i][0] == '*') {
$tokens .= $storyArray[$i] . " ";
$tokensArray[] = $storyArray[$i];
}
}
If you only needed $tokens for generating $tokensArray, you can get rid of it. Also, depending on whether you need $storyArray, preg_match_all(...) might be able to replace your code:
preg_match_all('/\*\w+/', $story, $tokensArray);
$tokensArray = $tokensArray[0];
You can also use a regular expression to achieve the same effect, without all the string manipulation you are doing right now. This would be the most elegant solution:
$string = "this is *noun but it is *name";
// Lets set up an empty array
$tokens = array();
preg_match_all('/\*\w+/m', $string, $tokens);
$tokens = $tokens[0]; // Only one sub-pattern, dropping unnecessary dimension.
var_dump($tokens);
Regular expressions exists to do mainly exactly the kind of task you are trying to achieve now. They are usually faster than doing string manipulations manually (Regular Expression engine in PHP is compiled code).
To explain my regex:
/: start boundary
\*: an asterisk (*)
\w: any alpha-numeric character or underscore
+: previous marker, 1 or more times. (match \w one or more times)
/: end boundary
m: multiline modifier
Replace
$tokens .= $storyArray[$i] + " ";
with
$tokens .= $storyArray[$i]." ";
And
$tokensArray = split(' ', $tokens);
with
$tokensArray = split(' ', rtrim($tokens));
$tokens .= $storyArray[$i] + " ";
in this line, you should be using the . operator to concatenate strings.

Categories