Strange behavior of preg_replace - php

I want to switch the protocol of a link. If it is http, it should become https, and https should become http. I'm using pre_replace but something is going wrong.
Could someone look at my code and tell me what I am missing in my thinking process?
Here is the code:
$pattern = array(
0 => '/^(http\:)/',
1 => '/^(https\:)/'
);
$replace = array(
0 => 'https:',
1 => 'http:'
);
ksort($pattern);
ksort($replace);
$url = 'http://someurl.com';
echo $url."<br />";
$url = preg_replace($pattern, $replace, trim($url),1);
die($url);

You do not need to escape :, it is not a special character.
You don't need a capture group ().
You don't need to call ksort(), your arrays are already sorted by key when you declare them.
You appear to have your code replacing 'http' with 'https' AND replacing 'https' with 'http'. Why?
$url = preg_replace('/^http:/', 'https', trim($url)); will work just fine if you're simply looking to force to https.
edit
I still don't know why anyone would want to switch both http/https concurrently, but here you go:
function protocol_switcheroo($url) {
if( preg_match('/^http:/', $url) ) {
return preg_replace('/^http:/', 'https:', $url); // http to https
} else if( preg_match('/^https:/', $url) ) {
return preg_replace('/^https:/', 'http:', $url); // https to http
} else {
return $url; // for URIs with protocols other than http/https
}
}
You need to separate out the calls to replace so that you do not accidentally chain them like in the original code in the question.

The reason this isn't working for http -> https (but does work for https -> http) is that preg_replace() first changes the http to https with the first set of key/variable (0), but then immediately back to https -> http, because then the second set of variables (1) in each array is another valid match.

//$url = 'http://example.com/https://www';
$url = 'https://example.com/http://www';
$url = (0 === strpos($url, 'http:'))
? substr_replace($url, 's:', 4, 1)
: substr_replace($url, '', 4, 1);
echo $url;
This will convert HTTP -> HTTPS and HTTPS -> HTTP
It does not use a regex which would be slower, and does not use str_replace() which can inadvertently replace other portions of the URL. It will only replace the first prefix.
Breakdown : it looks to see if the URL begins with http: is it does it will replace the 5th character : with s: making it HTTPS. Otherwise it will replace the 5th character s with nothing making it HTTP.

Your URL is getting replaced twice. First, first expression matches and http://someurl.com becomes https://someurl.com. Then, second expression matches and https://someurl.com becomes http://someurl.com.
It's easier to see with this other example:
echo preg_replace(
array('/fox/', '/turtle/'),
array('turtle', 'sparrow'),
'fox', 1);
... that prints sparrow.

The problem you have is that preg_replace() does the two replacements one after the other, so that after the first one has run, the second one reverses the effect of the first one.
You need to specify both patterns in a single expression in order to have them run together.
I suggest using preg_replace_callback() instead of preg_replace(). With this, you can write a more complex output expression, which makes it easier to combine them into a single pattern. Something like this will do the trick:
$outputString = preg_replace_callback(
'/^(http|ftp)(s)?(:)/',
function($matches) {return $matches[1].($matches[2]=='s'?'':'s').':';},
$inputString
);
Hope that helps.
[EDIT] Edited the code so it works for ftp/ftps as well as http/https, after comment by OP.

Related

PHP regex: How to remove ?file in url?

My url like this:
http://mywebsite.com/movies/937-lan-kwai-fong-2?file=Rae-Ingram&q=
http://mywebsite.com/movies/937-big-daddy?file=something&q=
I want to get "lan-kwai-fong-2" and "big-daddy", so I use this code but it doesn't work. Please help me fix it ! If you can shorten it, it is so great !
$url= $_SERVER['REQUEST_URI'];
preg_replace('/\?file.*/','',$url);
preg_match('/[a-z][\w\-]+$/',$url,$matches);
$matches= str_replace("-"," ",$matches[0]);
First there are issue with your code which im going to go over because they are general things:
preg_replace does not work by reference so you are never actually modifying the url. You need to assign the result of the replace to a variable:
// this would ovewrite the current value of url with the replaced value
$url = preg_replace('/\?file.*/','',$url);
It is possible that preg_match will not find anything so you need to test the result
// it should also be noted that sometimes you may need a more exact test here
// because it can return false (if theres an error) or 0 (if there is no match)
if (preg_match('/[a-z][\w\-]+$/',$url,$matches)) {
// do stuff
}
Now with that out of the way you are making this more difficult than it needs to be. There are specific function for working with urls parse_url and parse_str.
You can use these to easily work with the information:
$urlInfo = parse_url($_SERVER['REQUEST_URI']);
$movie = basename($urlInfo['path']); // yields 937-the-movie-title
Just replace
preg_replace('/\?file.*/','',$url);
with
$url= preg_replace('/\?file.*/','',$url);
Regex works, and parse_url is the right way to do it. But for something quick and dirty I would usually use explode. I think it's clearer.
#list($path, $query) = explode("?", $url, 2); // separate path from query
$match = array_pop(explode("/", $path)); // get last part of path
How about this:
$url = $_SERVER['REQUEST_URI'];
preg_match('/\/[^-]+-([^?]+)\?/', $url, $matches);
$str = isset($matches[1]) ? $matches[1] : false;`
match last '/'
match anything besides '-' until '-'
capture anything besides '?' until (not including) '?'

Get vine video id using php

I need to get the vine video id from the url
so the output from link like this
https://vine.co/v/bXidIgMnIPJ
be like this
bXidIgMnIPJ
I tried to use code form other question here for Vimeo (NOT VINE)
Get img thumbnails from Vimeo?
This what I tried to use but I did not succeed
$url = 'https://vine.co/v/bXidIgMnIPJ';
preg_replace('~^https://(?:www\.)?vine\.co/(?:clip:)?(\d+)~','$1',$url)
basename maybe?
<?php
$url = 'https://vine.co/v/bXidIgMnIPJ';
var_dump(basename($url));
http://codepad.org/vZiFP27y
Assuming it will always be in that format, you can just split the url by the / delimiter. Regex is not needed for a simple url such as this.
$id = end(explode('/', $url));
Referring to as the question is asked here is a solution for preg_replace:
$s = 'https://vine.co/v/bXidIgMnIPJ';
$new_s = preg_replace('/^.*\//','',$s);
echo $new_s;
// => bXidIgMnIPJ
or if you need to validate that an input string is indeed a link to vine.co :
$new_s = preg_replace('/^(https?:\/\/)?(www\.)?vine\.co.*\//','',$s);
I don't know if that /v/ part is always present or is it always v... if it is then it may also be added to regex for stricter validation:
$new_s = preg_replace('/^(https?:\/\/)?(www\.)?vine\.co\/v\//','',$s);
Here's what I am using:
function getVineId($url) {
preg_match("#(?<=vine.co/v/)[0-9A-Za-z]+#", $url, $matches);
if (isset($matches[0])) {
return $matches[0];
}
return false;
}
I used a look-behind to ensure "vine.co/v/" always precedes the ID, while ignoring if the url is HTTP or HTTPS (or if it lacks a protocol altogether). It assumes the ID is alphanumeric, of any length. It will ignore any characters or parameters after the id (like Google campaign tracking parameters, etc).
I used the "#" delimiter so I wouldn't have to escape the forward slashes (/), for a cleaner look.
explode the string with '/' and the last string is what you are looking for :) Code:
$vars = explode("/",$url);
echo $vars[count($vars)-1];
$url = 'https://vine.co/v/b2PFre2auF5';
$regex = '/^http(?:s?):\/\/(?:www\.)?vine\.co\/v\/([a-zA-Z0-9]{1,13})$/';
preg_match($regex,$url,$m);
print_r($m);
1. b2PFre2auF5

match part of a url

I would like to get all matches for any url's that have index.php?route=forum/ in them
Example urls to filter are:
http://test.codetrove.com/index.php?route=forum/forum
http://test.codetrove.com/index.php?route=forum/forum_category&forum_path=2
So i need the match to be true if it contains index.php?route=forum/ the http and domain can be anything like http or https or any domain.
Any idea's?
Rather than using a baseball bat to bludgeon a spider, take a look at strpos().
$string = "index.php?route=forum/";
if (strpos($url, $string) !== false) {
//we have a match
}
You can use regex :
/index\.php\?route=forum\/.*/
Or with the $_GET variable
if(preg_match('/forum\/.*/', $_GET['route'])) {
echo 'yahoo';
}
One possibility is to use the php strpos function documented here
$IsMatch = strpos ( $url , "index.php?route=forum/");

PHP : Cropping a selected text from a string

I want to get the string part of a URL cropping off http:// like from http://google.com I need to crop http:// and get google.com.
I used the following code an it gives me /google.com
strrchr("http://google.com" , "//");
how can I do this? ow can I get only google.com
*Update: * Google.com is just an example, the url can be a long url like http://artile.blogspot.com/article.htm so i need article.blogspot.com/article.htm
The function parse_url() is what you're looking for.
As Lawrence says, the exact code will be:
$host = parse_url('http://google.com',PHP_URL_HOST);
An alternative would be str_replace()
$host = str_replace("http://", "", "http://google.com", 1);
The fourth parameter(count) makes sure that it'll only replace the first instance of http://
Why not just look to see if the string starts with http:// and then use a function to get the remaining sub-string?
$url = 'http://google.com';
if (strpos($url, 'http://') === 0) {
$url = substr($url, 7);
}
If there are other prefixes that you would like to remove, then perhaps it might be time to start looking into a quick regex to get the job done. For example:
$url = 'https://google.com';
$url = preg_replace('#^(?:https?|ftps?|news|feed|gopher)://#', '', $url);
you could also use [^http://]+$

Add root folder onto URL Using Regex

I'm trying to transform any URL in PHP and add a root folder onto it using regex.
Before:
http://domainNamehere.com/event/test-event-in-the-future/
After:
http://domainNamehere.com/es/event/test-event-in-the-future/
Any ideas?
A simple solution without regex:
$root_folder = 'es';
$url = "http://domainNamehere.com/event/test-event-in-the-future/";
$p = strpos($url, '/', 8);
$url_new = sprintf('%s/%s/%s', substr($url, 0, $p), $root_folder, substr($url, $p+1));
EDIT: JavaScript solution is pretty much the same:
var root_folder = 'es';
var url = "http://domainNamehere.com/event/test-event-in-the-future/";
var p = url.indexOf('/', 8);
var url_new = url.substring(0,p) + '/' + root_folder + url.substring(p);
Of course, for a live application you should also check if p actually gets a valid value assigned (that means if the slash is found) because you might have an invalid url in your input or empty string.
$url = 'http://domainNamehere.com/event/test-event-in-the-future/'; //or the function you use to grab it;
$url = str_replace('domainNamehere.com','domainNamehere.com/es', $url);
quite dirty but effective and without regex, assuming your "es" folder is always in that position (I think so)
If the domain name is always the same you can use:
$string = str_replace('domainNamehere.com', 'domainNamehere.com/es', $url);
Untested:
$url = preg_replace('#(?<=^[a-z]+://[^/]+/)#i', "es/", $url);
Use '#' to delimit the regular expression so the slashes don't have to be escaped.
(?<=...) searches for a match for [a-z]://[^/]+/ without including it in the matched string.
[a-z]+://[^/]/ matches a sequence of letters followed by :// followed by non-slashes, then a slash. This will handle all web protocols, particularly http and https.
The little i makes the search case-insensitive.
The replacement just inserts es/ after the match.
This is the most succinct way that I could think of how to do it.
$new_url = preg_replace('#(?<=\w)(?=/)#', '/en', $url, 1);
It will insert whatever you put in the 2nd parameter into the string just before the first slash that also has a proceeding alphanumeric character.
Tested with PHP 5.3.6

Categories