I want to use preg_match() in my code, but the result is nothing ... (or null or empty ?)
$domain = "stackoverflow.com";
$uriToTest = "http://stackoverflow.com/";
$pattern = "/^http(s)?://(([a-z]+)\.)*".$domain."/";
echo preg_match($pattern, $uriToTest);
What is the problem?
If you take a look at your pattern, it's this
/^http(s)?://(([a-z]+)\.)*stackoverflow.com/
The delimiter is used as a matching character, and if you had errors turned on, you'd get a "Unknown modifier" error. So first tip: TURN ERROR REPORTING ON!
To fix it, try using a different delimiter, e.g. {}, as it's easier to read than loads of leaning toothpicks...
{^http(s)?://(([a-z]+)\.)*stackoverflow.com}
The other problem is the dot in the $domain becomes a wildcard match - anytime you insert unknown data into a regex, get in the habit of using preg_quote to escape it, e.g.
$pattern = "{^http(s)?://(([a-z]+)\.)*" . preg_quote($domain, '{') . "}";
(Note - nice catch from stema in the comments: if you use a different delimiter, you must pass that preg_quote. It's clever enough to spot paired delimiters, so if you pass { it will also escape }.)
You're most likely getting an error and preg_match is returning false, as you are not escaping your forward slashes in your expression. Either use something else like a # as the expression delimeter or escape any forward slashes to stop the parser from trying to end the expression (/ should be \/ - or change the / at either end to be #)
//Quick fix to yours
$pattern = "/^http(s)?:\/\/(([a-z]+)\.)*".preg_quote($domain,'/')."/";
//More legible fix
$pattern = '#^https?://(([a-z]+)\.)*'.preg_quote($domain,'#').'#';
Note that you don't need parenthesis around the s in https (unless you're hoping to capture it)
You need to escape your forward slashes and the . in the domain name
$domain = "stackoverflow.com";
$uriToTest = "http://stackoverflow.com/";
$escapedDomain = str_replace('.', '\.', $domain);
$pattern = "/^http(s)?:\/\/(([a-z]+)\.)*".$escapedDomain."/";
echo preg_match($pattern, $uriToTest);
If you were using T-Regx, then this exception would be thrown immediately:
$domain = "stackoverflow.com";
$uriToTest = "http://stackoverflow.com/";
try
{
pattern("/^http(s)?://(([a-z]+)\.)*" . $domain . '/')->match($uriToTest);
}
catch (SafeRegexException $e) {
echo $e->getMessage(); // `Unknown modifier '/'`
}
But also!! T-Regx can automatically add delimiters, so you can go
pattern("^http(s)?://(([a-z]+)\.)*" . $domain)->match($uriToTest);
and it would automatically add a suitable delimiter for you.
$domain = "stackoverflow.com";
$uriToTest = "http://stackoverflow.com/";
$pattern = "^http(s)?://(([a-z]+)\.)*" . $domain . "^";
preg_match($pattern, $uriToTest, $matches);
print_r($matches);
Related
I want to replace my last \ with / on this URL string
C:\wamp\www\chm-lib\sekhelp_out\HTML\AS_BUILD.htm
I have tried this link, but no changes, I am missing something, please correct me where I am wrong.
Here is a solution using PHP's string functions instead of regex.
Do this:
$url = 'C:\wamp\www\chm-lib\sekhelp_out\HTML\AS_BUILD.htm';
$pos = strrpos($url, '\\');
$url = substr_replace($url, '/', $pos, 1);
echo $url;
To get this:
C:\wamp\www\chm-lib\sekhelp_out\HTML/AS_BUILD.htm
Explanation:
Get the position of the last \ in the input string using strrpos()
Replace that with / using substr_replace()
Note
It is important to pass '\\' instead of '\' to strrpos() as the first \ escapes the second.
Also note that you can shorten the code above to a single line if you prefer, but I thought it would be easier to understand as is. Anyway, here is the code as a one-liner function:
function reverseLastBackslash($url) {
return substr_replace($url, '/', strrpos($url, '\\'), 1);
}
You can try exploding the string as an array and imploding after popping off the last part, and connecting it back with a forward slash.
$array = explode('\','C:\wamp\www\chm-lib\sekhelp_out\HTML\AS_BUILD.htm');
$last = array_pop($array);
$corrected = implode('\',$array) . '/' . $last;
The backslash escaping is tricky:
preg_replace('/\\\\([^\\\\]*)$/', '/$1', "C:\\wamp\\www\\chm-lib\\sekhelp_out\\HTML\\AS_BUILD.htm")
You have to escape once for the literal string and once for the regular expression so a single \ needs to be \\\\ (1 x 2 x 2)
Simply use this
str_replace('\\','/','C:\wamp\www\chm-lib\sekhelp_out\HTML\AS_BUILD.htm');
I'am trying to use regular expression to get just file name from URL for example:
$link = "http://localhost/website/index.php";
$pattern = '/.*?\.php';
preg_match($pattern, $link, $matches);
but it returns "//localhost/website/index.php" instead of "index".
Does your code even run? You haven't used any delimiters...
With preg_match, you could use a negated class instead, because / matches the first / then .*? will match all the characters up to .php... and if you want to get only index, it would be simplest to use a capture group like so:
$link = "http://localhost/website/index.php";
$pattern = '~([^/]+)\.php~';
preg_match($pattern, $link, $matches);
echo $matches[1]; # Get the captured group from the array $matches
Or you can simply use the basename function:
echo basename($link, ".php");
I think you would be much better off using a function dedicated to the purpose, rather than a custom regular expression.
Since the example you provided is actually a URL, you could use the parse_url function:
http://php.net/manual/en/function.parse-url.php
You should also look at the pathinfo (well done PHP on the naming consistency there!):
http://php.net/manual/en/function.pathinfo.php
You could then do something like this:
$url = 'http://localhost/path/file.php';
$url_info = parse_url($url);
$full_path = $url_info['path'];
$path_info = pathinfo($full_path);
$file_name = $path_info['filename'] . '.' . $path_info['extension'];
print $file_name; // outputs "file.php"
This might seem more verbose than using regular expressions, but it likely to be much faster and, more importantly, much more robust.
I need a function which will check for the existing URLs in a string.
function linkcleaner($url) {
$regex="(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))";
if(preg_match($regex, $url, $matches)) {
echo $matches[0];
}
}
The regular expression is taken from the John Gruber's blog, where he addressed the problem of creating a regex matching all the URLs.
Unfortunately, I can't make it work. It seems the problem is coming from the double quotes inside the regex or the other punct symbols at the end of the expression.
Any help is appreciated.
Thank you!
You need to escape the " with a \
Apart from #tandu's answer, you also need delimiters for a regex in php.
The easiest would be to start and end your pattern with an # as that character does not appear in it:
$regex="#(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'\".,<>?«»“”‘’]))#";
Jack Maney's comment...EPIC :D
On a more serious note, it does not work because you terminated the string literal right in the middle.
To include a double quote (") in a string, you need to escape it using a \
So, the line will be
$regex="/(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'\".,<>?«»“”‘’]))/";
Notice I've escaped the (') as well. That is for when you define a string between 2 single quotes.
I am not sure how you guys read this regex, cause it's a real pain to read/modify... ;)
try this (this is not a one-liner, yes, but it is easy to understand and modify if needed):
<?php
$re_proto = "(?:https?|ftp|gopher|irc|whateverprotoyoulike)://";
$re_ipv4_segment = "[12]?[0-9]{1,2}";
$re_ipv4 = "(?:{$re_ipv4_segment}[.]){3}".$re_ipv4_segment;
$re_hostname = "[a-z0-9_]+(?:[.-][a-z0-9_]+){0,}";
$re_hostname_fqdn = "[a-z0-9_](?:[a-z0-9_-]*[.][a-z0-9]+){1,}";
$re_host = "(?:{$re_ipv4}|{$re_hostname})";
$re_host_fqdn = "(?:{$re_ipv4}|{$re_hostname_fqdn})";
$re_port = ":[0-9]+";
$re_uri = "(?:/[a-z0-9_.%-]*){0,}";
$re_querystring = "[?][a-z0-9_.%&=-]*";
$re_anchor = "#[a-z0-9_.%-]*";
$re_url = "(?:(?:{$re_proto})(?:{$re_host})|{$re_host_fqdn})(?:{$re_port})?(?:{$re_uri})?(?:{$re_querystring})?(?:{$re_anchor})?";
$text = <<<TEXT
http://www.example.com
http://www.example.com/some/path/to/file.php?f1=v1&f2=v2#foo
http://localhost.localdomain/
http://localhost/docs/???
www....wwhat?
www.example.com
ftp://ftp.mozilla.org/pub/firefox/latest/
Some new Mary-Kate Olsen pictures I found: the splendor of the Steiner Street Picture of href… http://t.co/tJ2NJjnf
TEXT;
$count = preg_match_all("\01{$re_url}\01is", $text, $matches);
var_dump($count);
var_dump($matches);
?>
I'd like to replace more than one forward slash with one forward slash.
Examples:
this/is//an//example -> this/is/an/example
///another//example//// -> /another/example/
example.com///another//example//// -> example.com/another/example/
Thanks!
EDIT: This will be used to fix URLs that have more than one forward slash.
try
preg_replace('#/+#','/',$str);
or
preg_replace('#/{2}#','/',$str);
Tips: use str_replace for such a simple replacement AS it
replace all occurrences of the search string with the replacement string
str_replace('/','/',$str);
Reference
You might want to use regex:
$modifiedString = preg_replace('|/{2,}|','/',$strToModify);
I use the {2,} instead of + to avoid replacing single '/'.
Use a regex to replace one or more /-es with /:
$string = preg_replace('#/+#', '/', $string);
I see you want to create a valid url... you might want to check out realpath, or maybe even better the snippet in the first comment:
$path = '../gallery/index/../../advent11/app/';
$pattern = '/\w+\/\.\.\//';
while(preg_match($pattern, $path)) {
$path = preg_replace($pattern, '', $path);
}
// $path == '../advent11/app/'
As you can see this also solves ../-es :)
those reqular expressions drive me crazy. I'm stuck with this one:
test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not
Task:
Remove all [[ and ]] and if there is an option split choose the later one so output should be:
test1:link test2:silver test3:out1insideout2 test4:this|not
I came up with (PHP)
$text = preg_replace("/\\[\\[|\\]\\]/",'',$text); // remove [[ or ]]
this works for part1 of the task. but before that I think I should do the option split, my best solution:
$text = preg_replace("/\\[\\[(.*\|)(.*?)\\]\\]/",'$2',$text);
Result:
test1:silver test3:[[out1[[inside]]out2]] this|not
I'm stuck. may someone with some free minutes help me? Thanks!
I think the easiest way to do this would be multiple passes. Use a regular expression like:
\[\[(?:[^\[\]]*\|)?([^\[\]]+)\]\]
This will replace option strings to give you the last option from the group. If you run it repeatedly until it no longer matches, you should get the right result (the first pass will replace [[out1[[inside]]out2]] with [[out1insideout2]] and the second will ditch the brackets.
Edit 1: By way of explanation,
\[\[ # Opening [[
(?: # A non-matching group (we don't want this bit)
[^\[\]] # Non-bracket characters
* # Zero or more of anything but [
\| # A literal '|' character representing the end of the discarded options
)? # This group is optional: if there is only one option, it won't be present
( # The group we're actually interested in ($1)
[^\[\]] # All the non-bracket characters
+ # Must be at least one
) # End of $1
\]\] # End of the grouping.
Edit 2: Changed expression to ignore ']' as well as '[' (it works a bit better like that).
Edit 3: There is no need to know the number of nested brackets as you can do something like:
$oldtext = "";
$newtext = $text;
while ($newtext != $oldtext)
{
$oldtext = $newtext;
$newtext = preg_replace(regexp,replace,$oldtext);
}
$text = $newtext;
Basically, this keeps running the regular expression replace until the output is the same as the input.
Note that I don't know PHP, so there are probably syntax errors in the above.
This is impossible to do in one regular expression since you want to keep content in multiple "hierarchies" of the content. It would be possible otherwise, using a recursive regular expression.
Anyways, here's the simplest, most greedy regular expression I can think of. It should only replace if the content matches your exact requirements.
You will need to escape all backslashes when putting it into a string (\ becomes \\.)
\[\[((?:[^][|]+|(?!\[\[|]])[^|])++\|?)*]]
As others have already explained, you use this with multiple passes. Keep looping while there are matches, performing replacement (only keeping match group 1.)
Difference from other regular expressions here is that it will allow you to have single brackets in the content, without breaking:
test1:[[link]] test2:[[gold|si[lv]er]]
test3:[[out1[[in[si]de]]out2]] test4:this|not
becomes
test1:[[link]] test2:si[lv]er
test3:out1in[si]deout2 test4:this|not
Why try to do it all in one go. Remove the [[]] first and then deal with options, do it in two lines of code.
When trying to get something going favour clarity and simplicity.
Seems like you have all the pieces.
Why not just simply remove any brackets that are left?
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not';
$str = preg_replace('/\\[\\[(?:[^|\\]]+\\|)+([^\\]]+)\\]\\]/', '$1', $str);
$str = str_replace(array('[', ']'), '', $str);
Well, I didn't stick to just regex, because I'm of a mind that trying to do stuff like this with one big regex leads you to the old joke about "Now you have two problems". However, give something like this a shot:
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not'; $reg = '/(.*?):(.*?)( |$)/';
preg_match_all($reg, $str, $m);
foreach($m[2] as $pos => $match) {
if (strpos($match, '|') !== FALSE && strpos($match, '[[') !== FALSE ) {
$opt = explode('|', $match); $match = $opt[count($opt)-1];
}
$m[2][$pos] = str_replace(array('[', ']'),'', $match );
}
foreach($m[1] as $k=>$v) $result[$k] = $v.':'.$m[2][$k];
This is C# using only using non-escaped strings, hence you will have to double the backslashes in other languages.
String input = "test1:[[link]] " +
"test2:[[gold|silver]] " +
"test3:[[out1[[inside]]out2]] " +
"test4:this|not";
String step1 = Regex.Replace(input, #"\[\[([^|]+)\|([^\]]+)\]\]", #"[[$2]]");
String step2 = Regex.Replace(step1, #"\[\[|\]\]", String.Empty);
// Prints "test1:silver test3:out1insideout2 test4:this|not"
Console.WriteLine(step2);
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not';
$s = preg_split("/\s+/",$str);
foreach ($s as $k=>$v){
$v = preg_replace("/\[\[|\]\]/","",$v);
$j = explode(":",$v);
$j[1]=preg_replace("/.*\|/","",$j[1]);
print implode(":",$j)."\n";
}