Match regex i url - php

I am trying to strip out a page query, in the middle of building pagination on a page.
The url is
$url = www.example.com?category=computers&page=10
Regex
$cleanUrl = preg_replace('/page=[0-9,]+&/', '', $url);
This isn't finding a match though?

Your regex is failing because you are searching for an ampersand after the page parameter with its value.
You can make the ampersand optional by appending a question mark after it.
$url = 'www.example.com?category=computers&page=10';
$cleanUrl = preg_replace('/page=[0-9,]+&?/', '', $url);
Example, https://regex101.com/r/uW8uV8/1
For a more through regex try out:
<?php
$url = 'www.example.com?&category=cat&page=10';
$cleanUrl = rtrim(preg_replace('/(&|\?)page=[0-9,]+&?/', '$1', $url), '&?');
echo $cleanUrl;
This requires the rtrim function though to remove the trailing parameter glue. This will account for...
Page by itself, www.example.com?page=10
Page as the last parameter, www.example.com?species=wolf&page=10
Page as the first parameter, www.example.com?page=10&species=wolf
Page as a middle parameter, www.example.com?cat=tom&mouse=jerry&page=10&species=wolf

You should start and end your pattern with ~. This will work:
preg_replace('~&page=[0-9]+~', '', 'www.example.com?category=computers&page=10');

Related

Effective way to remove everything after last ? from string

I am trying to strip everything that follows and includes the last ? of a given url. I am currently working with preg_replace but no luck in accomplishing the goal. This is the regex #\/[^?]*$# I am using to single out the last ?. Also is there a faster way by using substr?
Example link:
preg_replace('#\/[^?]*$#', '', $post="www.exapmle.com?26sf213132aasdf1312sdf31")
Desired Output
www.example.com
Here's how to do it with substr and strrpos:
$post = "www.exapmle.com?26sf213132aasdf1312sdf31";
$pos = strrpos($post, '?');
$result = substr($post, 0, $pos);
Add a \? at start of regex instead of \/
\?[^?]*$
\? matches a ?
[^?]*$ matches anything other than a ? until the end of string anchored by $
Example http://regex101.com/r/sW6jE7/3
$post="www.exapmle.com?26sf213132aasdf1312sdf31";
$res=preg_replace('/\?[^?]*$/', '', $post);
echo $res;
will give an output
www.example.com
EDIT
If you want to remove the entire query string from url then a slight modifiation of regex would do the work
\?.*$
which will remove anything followed by a question mark
Simply match everything from the start upto the ? symbol.
preg_match('/^[^?\n]*/', $post="www.exapmle.com?26sf213132aasdf1312sdf31", $match);
echo $match[0];
Output:
www.exapmle.com
Try this its working fine :
$host_url = "www.exapmle.com?26sf213132aasdf1312sdf31";
$part_url = strrpos($host_url, '?');
$result = substr($host_url, 0, $part_url);
echo $result;
Although OP tags regex, Surprisingly nobody suggests explode(), which is much easier.
$post = "www.exapmle.com?26sf213132aasdf1312sdf31";
$tokens = explode('?', $post);
echo $tokens[0]; // www.exapmle.com

Get vine video id using php

I need to get the vine video id from the url
so the output from link like this
https://vine.co/v/bXidIgMnIPJ
be like this
bXidIgMnIPJ
I tried to use code form other question here for Vimeo (NOT VINE)
Get img thumbnails from Vimeo?
This what I tried to use but I did not succeed
$url = 'https://vine.co/v/bXidIgMnIPJ';
preg_replace('~^https://(?:www\.)?vine\.co/(?:clip:)?(\d+)~','$1',$url)
basename maybe?
<?php
$url = 'https://vine.co/v/bXidIgMnIPJ';
var_dump(basename($url));
http://codepad.org/vZiFP27y
Assuming it will always be in that format, you can just split the url by the / delimiter. Regex is not needed for a simple url such as this.
$id = end(explode('/', $url));
Referring to as the question is asked here is a solution for preg_replace:
$s = 'https://vine.co/v/bXidIgMnIPJ';
$new_s = preg_replace('/^.*\//','',$s);
echo $new_s;
// => bXidIgMnIPJ
or if you need to validate that an input string is indeed a link to vine.co :
$new_s = preg_replace('/^(https?:\/\/)?(www\.)?vine\.co.*\//','',$s);
I don't know if that /v/ part is always present or is it always v... if it is then it may also be added to regex for stricter validation:
$new_s = preg_replace('/^(https?:\/\/)?(www\.)?vine\.co\/v\//','',$s);
Here's what I am using:
function getVineId($url) {
preg_match("#(?<=vine.co/v/)[0-9A-Za-z]+#", $url, $matches);
if (isset($matches[0])) {
return $matches[0];
}
return false;
}
I used a look-behind to ensure "vine.co/v/" always precedes the ID, while ignoring if the url is HTTP or HTTPS (or if it lacks a protocol altogether). It assumes the ID is alphanumeric, of any length. It will ignore any characters or parameters after the id (like Google campaign tracking parameters, etc).
I used the "#" delimiter so I wouldn't have to escape the forward slashes (/), for a cleaner look.
explode the string with '/' and the last string is what you are looking for :) Code:
$vars = explode("/",$url);
echo $vars[count($vars)-1];
$url = 'https://vine.co/v/b2PFre2auF5';
$regex = '/^http(?:s?):\/\/(?:www\.)?vine\.co\/v\/([a-zA-Z0-9]{1,13})$/';
preg_match($regex,$url,$m);
print_r($m);
1. b2PFre2auF5

Strange behavior of preg_replace

I want to switch the protocol of a link. If it is http, it should become https, and https should become http. I'm using pre_replace but something is going wrong.
Could someone look at my code and tell me what I am missing in my thinking process?
Here is the code:
$pattern = array(
0 => '/^(http\:)/',
1 => '/^(https\:)/'
);
$replace = array(
0 => 'https:',
1 => 'http:'
);
ksort($pattern);
ksort($replace);
$url = 'http://someurl.com';
echo $url."<br />";
$url = preg_replace($pattern, $replace, trim($url),1);
die($url);
You do not need to escape :, it is not a special character.
You don't need a capture group ().
You don't need to call ksort(), your arrays are already sorted by key when you declare them.
You appear to have your code replacing 'http' with 'https' AND replacing 'https' with 'http'. Why?
$url = preg_replace('/^http:/', 'https', trim($url)); will work just fine if you're simply looking to force to https.
edit
I still don't know why anyone would want to switch both http/https concurrently, but here you go:
function protocol_switcheroo($url) {
if( preg_match('/^http:/', $url) ) {
return preg_replace('/^http:/', 'https:', $url); // http to https
} else if( preg_match('/^https:/', $url) ) {
return preg_replace('/^https:/', 'http:', $url); // https to http
} else {
return $url; // for URIs with protocols other than http/https
}
}
You need to separate out the calls to replace so that you do not accidentally chain them like in the original code in the question.
The reason this isn't working for http -> https (but does work for https -> http) is that preg_replace() first changes the http to https with the first set of key/variable (0), but then immediately back to https -> http, because then the second set of variables (1) in each array is another valid match.
//$url = 'http://example.com/https://www';
$url = 'https://example.com/http://www';
$url = (0 === strpos($url, 'http:'))
? substr_replace($url, 's:', 4, 1)
: substr_replace($url, '', 4, 1);
echo $url;
This will convert HTTP -> HTTPS and HTTPS -> HTTP
It does not use a regex which would be slower, and does not use str_replace() which can inadvertently replace other portions of the URL. It will only replace the first prefix.
Breakdown : it looks to see if the URL begins with http: is it does it will replace the 5th character : with s: making it HTTPS. Otherwise it will replace the 5th character s with nothing making it HTTP.
Your URL is getting replaced twice. First, first expression matches and http://someurl.com becomes https://someurl.com. Then, second expression matches and https://someurl.com becomes http://someurl.com.
It's easier to see with this other example:
echo preg_replace(
array('/fox/', '/turtle/'),
array('turtle', 'sparrow'),
'fox', 1);
... that prints sparrow.
The problem you have is that preg_replace() does the two replacements one after the other, so that after the first one has run, the second one reverses the effect of the first one.
You need to specify both patterns in a single expression in order to have them run together.
I suggest using preg_replace_callback() instead of preg_replace(). With this, you can write a more complex output expression, which makes it easier to combine them into a single pattern. Something like this will do the trick:
$outputString = preg_replace_callback(
'/^(http|ftp)(s)?(:)/',
function($matches) {return $matches[1].($matches[2]=='s'?'':'s').':';},
$inputString
);
Hope that helps.
[EDIT] Edited the code so it works for ftp/ftps as well as http/https, after comment by OP.

regular expression to remove ID=1234 from a string (PHP)

I am trying to create a regular expression to do the following (within a preg_replace)
$str = 'http://www.site.com&ID=1620';
$str = 'http://www.site.com';
How would I write a preg_replace to simply remove the &ID=1620 from the string (taking into account the ID could be variable string length
thanks in advance
You could use...
$str = preg_replace('/[?&;]ID=\d+/', '', $str);
I'm assuming this is meant to be a normal URL, hence the [?&;]. If that's the case, the & should be a ?.
If it's part of a larger list of GET params, you are probably better off using...
parse_str($str, $params);
unset($params['ID']);
$str = http_build_query($params);
I'm guessing that & is not allowed as a character in the ID attribute. In that case, you can use
$result = preg_replace('/&ID=[^&]+/', '', $subject);
or (possibly better, thanks to PaulP.R.O.):
$result = preg_replace('/[?&]ID=[^&]+/', '', $subject);
This will remove &ID= (the second version would also remove ?ID=) plus any amount of characters that follow until the next & or end of string. This approach makes sure that any following attributes will be left alone:
$str = 'http://www.site.com?spam=eggs&ID=1620&foo=bar';
will be changed into
$str = 'http://www.site.com?spam=eggs&foo=bar';
You can just use parse_url
(that is if the URL is of the form: http://something.com?id1=1&id2=2):
$url = parse_url($str);
echo "http://{$url['host]}";

Add root folder onto URL Using Regex

I'm trying to transform any URL in PHP and add a root folder onto it using regex.
Before:
http://domainNamehere.com/event/test-event-in-the-future/
After:
http://domainNamehere.com/es/event/test-event-in-the-future/
Any ideas?
A simple solution without regex:
$root_folder = 'es';
$url = "http://domainNamehere.com/event/test-event-in-the-future/";
$p = strpos($url, '/', 8);
$url_new = sprintf('%s/%s/%s', substr($url, 0, $p), $root_folder, substr($url, $p+1));
EDIT: JavaScript solution is pretty much the same:
var root_folder = 'es';
var url = "http://domainNamehere.com/event/test-event-in-the-future/";
var p = url.indexOf('/', 8);
var url_new = url.substring(0,p) + '/' + root_folder + url.substring(p);
Of course, for a live application you should also check if p actually gets a valid value assigned (that means if the slash is found) because you might have an invalid url in your input or empty string.
$url = 'http://domainNamehere.com/event/test-event-in-the-future/'; //or the function you use to grab it;
$url = str_replace('domainNamehere.com','domainNamehere.com/es', $url);
quite dirty but effective and without regex, assuming your "es" folder is always in that position (I think so)
If the domain name is always the same you can use:
$string = str_replace('domainNamehere.com', 'domainNamehere.com/es', $url);
Untested:
$url = preg_replace('#(?<=^[a-z]+://[^/]+/)#i', "es/", $url);
Use '#' to delimit the regular expression so the slashes don't have to be escaped.
(?<=...) searches for a match for [a-z]://[^/]+/ without including it in the matched string.
[a-z]+://[^/]/ matches a sequence of letters followed by :// followed by non-slashes, then a slash. This will handle all web protocols, particularly http and https.
The little i makes the search case-insensitive.
The replacement just inserts es/ after the match.
This is the most succinct way that I could think of how to do it.
$new_url = preg_replace('#(?<=\w)(?=/)#', '/en', $url, 1);
It will insert whatever you put in the 2nd parameter into the string just before the first slash that also has a proceeding alphanumeric character.
Tested with PHP 5.3.6

Categories