Regex for add GET param to URL - php

I want to add GET parameter to all URLs in special string (like html content of a website) .
For example :
Before:
$content = '... register ... login ...';
After:
$content = '... register ... login ...';
I think that this is only done with a regular expression , for this reason I wrote this function :
function makeLinks($str)
{
$str = preg_replace('#((https?://)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)#', '$1?wid=${wid}', $str);
return $str;
}
But this pattern having problems! for example :
http://google.com?foo=bar => http://google.com?wid=${wid}?foo=bar
Please help me.

Try This:
function makeLinks($str)
{
$str = preg_replace_callback('/\b((?:https?|ftp):\/\/(?:[-A-Z0-9.]+)(?:\/[-A-Z0-9+&##\/%=~_|!:,.;]*)?)(?:\?([A-Z0-9+&##\/%=~_|!:,.;]*))?/i', 'modify_url', $str);
return $str;
}
function modify_url($matches) {
$query = isset($matches[2]) ? $matches[2]:'';
$result = $matches[1].'?'.$query;
if(!empty($query)) $result .= '&';
return $result.'wid=${wid}';
}
Optionally you can just add # without affecting the ending result. I hate to use them, but here it is in case you want to use them:
function modify_url($matches) {
$result = $matches[1].'?'.#$matches[2];
if(!#empty($matches[2])) $result .= '&';
return $result.'wid=${wid}';
}
Idealy, you should extract the urls and parse them, but this solution should work.

I think there may be a short way. My solution :
function makeLinks($str) {
preg_match_all('|(https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*))|', $str, $urls);
if ($urls && isset($urls[1])) {
foreach ($urls[1] as $url) {
$new_url = $url . (strpos($url, '?') ? '&' : '?') . 'wid=${wid}';
$str = str_replace($url, $new_url, $str);
}
}
return $str;
}

Related

Is it possible to convert "x.y.z" to "x[y][z]" using regexp?

What is the most efficient pattern to replace dots in dot-separated string to an array-like string e.g x.y.z -> x[y][z]
Here is my current code, but I guess there should be a shorter method using regexp.
function convert($input)
{
if (strpos($input, '.') === false) {
return $input;
}
$input = str_replace_first('.', '[', $input);
$input = str_replace('.', '][', $input);
return $input . ']';
}
In your particular case "an array-like string" can be easily obtained using preg_replace function:
$input = "x.d.dsaf.d2.d";
print_r(preg_replace("/\.([^.]+)/", "[$1]", $input)); // "x[d][dsaf][d2][d]"
From what I can understand from your question; "x.y.z" is a String and so should "x[y][z]" be, right?
If that is the case, you may want to give the following code snippet a try:
<?php
$dotSeparatedString = "x.y.z";
$arrayLikeString = "";
//HERE IS THE REGEX YOU ASKED FOR...
$arrayLikeString = str_replace(".", "", preg_replace("#(\.[a-z0-9]*[^.])#", "[$1]", $dotSeparatedString));
var_dump($arrayLikeString); //DUMPS: 'x[y][z]'
Hope it helps you, though....
Using a fairly simple preg_replace_callback() that simply returns a different replacement for the first occurrence of . compared to the other occurrences.
$in = "x.y.z";
function cb($matches) {
static $first = true;
if (!$first)
return '][';
$first = false;
return '[';
}
$out = preg_replace_callback('/(\.)/', 'cb', $in) . ((strpos('.', $in) !== false) ? ']' : ']');
var_dump($out);
The ternary append is to handle the case of no . to replace
already answered but you could simply explode on the period delimiter then reconstruct a string.
$in = 'x.y.z';
$array = explode('.', $in);
$out = '';
foreach ($array as $key => $part){
$out .= ($key) ? '[' . $part . ']' : $part;
}
echo $out;

Regex not quite right

I have a site crawler which displays a list of urls, but the problem is I cannot for the life of me get the last regex quite right.
all urls end up listed as:
http://www.website.org/page1.html&--EFTTIUGJ4ITCyh0Frzb_LFXe_eHw
http://website.net/page2/&--EyqBLeFeCkSfmvA7p0cLrsy1Zm1g
http://foobar.website.com/page3.php&--E5WRBxuTOQikDIyBczaVXveOdRFg
The Urls can all be different and the only thing which seems static is the & symbol.
How would go abouts getting rid of the & symbol and everything beyond it to the right?
Here is what I have tried with the above results:
function getresults($sterm) {
$html = file_get_html($sterm);
$result = "";
// find all span tags with class=gb1
foreach($html->find('h3[class="r"]') as $ef)
{
$result .= $ef->outertext . '<br>';
}
return $result;
}
function geturl($url) {
$var = $url;
$result = "";
preg_match_all ("/a[\s]+[^>]*?href[\s]?=[\s\"\/url?q=\']+".
"(.*?)[\"\']+.*?>"."([^<]+|.*?)?<\/a>/",
$var, $matches);
$matches = $matches[1];
foreach($matches as $var)
{
$result .= $var."<br>";
}
echo preg_replace('/sa=U.*?usg=.*?AFQjCN/', "--" , $result);
}
if url are ALWAYS in the same format, use explode :
<?php
$tmp = explode("&", "http://foobar.website.com/page3.php&--E5WRBxuTOQikDIyBczaVXveOdRFg");
?>
$tmp[0] should content "http://foobar.website.com/page3.php" and
$tmp[1] should content "--E5WRBxuTOQikDIyBczaVXveOdRFg"
A simple way to remove everything after the & character:
$result = substr($result, 0, strpos($result, '&'));

Strip off specific parameter from URL's querystring

I have some links in a Powerpoint presentation, and for some reason, when those links get clicked, it adds a return parameter to the URL. Well, that return parameter is causing my Joomla site's MVC pattern to get bungled.
What's an efficient way to strip off this return parameter using PHP?
Example:
http://mydomain.example/index.php?id=115&Itemid=283&return=aHR0cDovL2NvbW11bml0
The safest "correct" method would be:
Parse the url into an array with parse_url()
Extract the query portion, decompose that into an array using parse_str()
Delete the query parameters you want by unset() them from the array
Rebuild the original url using http_build_query()
Quick and dirty is to use a string search/replace and/or regex to kill off the value.
In a different thread Justin suggests that the fastest way is to use strtok()
$url = strtok($url, '?');
See his full answer with speed tests as well here: https://stackoverflow.com/a/1251650/452515
This is to complement Marc B's answer with an example, while it may look quite long, it's a safe way to remove a parameter. In this example we remove page_number
<?php
$x = 'http://url.example/search/?location=london&page_number=1';
$parsed = parse_url($x);
$query = $parsed['query'];
parse_str($query, $params);
unset($params['page_number']);
$string = http_build_query($params);
var_dump($string);
function removeParam($url, $param) {
$url = preg_replace('/(&|\?)'.preg_quote($param).'=[^&]*$/', '', $url);
$url = preg_replace('/(&|\?)'.preg_quote($param).'=[^&]*&/', '$1', $url);
return $url;
}
parse_str($queryString, $vars);
unset($vars['return']);
$queryString = http_build_query($vars);
parse_str parses a query string, http_build_query creates a query string.
Procedural Implementation of Marc B's Answer after refining Sergey Telshevsky's Answer.
function strip_param_from_url($url, $param)
{
$base_url = strtok($url, '?'); // Get the base URL
$parsed_url = parse_url($url); // Parse it
// Add missing {
if(array_key_exists('query',$parsed_url)) { // Only execute if there are parameters
$query = $parsed_url['query']; // Get the query string
parse_str($query, $parameters); // Convert Parameters into array
unset($parameters[$param]); // Delete the one you want
$new_query = http_build_query($parameters); // Rebuilt query string
$url =$base_url.'?'.$new_query; // Finally URL is ready
}
return $url;
}
// Usage
echo strip_param_from_url( 'http://url.example/search/?location=london&page_number=1', 'location' )
You could do a preg_replace like:
$new_url = preg_replace('/&?return=[^&]*/', '', $old_url);
Here is the actual code for what's described above as the "the safest 'correct' method"...
function reduce_query($uri = '') {
$kill_params = array('gclid');
$uri_array = parse_url($uri);
if (isset($uri_array['query'])) {
// Do the chopping.
$params = array();
foreach (explode('&', $uri_array['query']) as $param) {
$item = explode('=', $param);
if (!in_array($item[0], $kill_params)) {
$params[$item[0]] = isset($item[1]) ? $item[1] : '';
}
}
// Sort the parameter array to maximize cache hits.
ksort($params);
// Build new URL (no hosts, domains, or fragments involved).
$new_uri = '';
if ($uri_array['path']) {
$new_uri = $uri_array['path'];
}
if (count($params) > 0) {
// Wish there was a more elegant option.
$new_uri .= '?' . urldecode(http_build_query($params));
}
return $new_uri;
}
return $uri;
}
$_SERVER['REQUEST_URI'] = reduce_query($_SERVER['REQUEST_URI']);
However, since this will likely exist prior to the bootstrap of your application, you should probably put it into an anonymous function. Like this...
call_user_func(function($uri) {
$kill_params = array('gclid');
$uri_array = parse_url($uri);
if (isset($uri_array['query'])) {
// Do the chopping.
$params = array();
foreach (explode('&', $uri_array['query']) as $param) {
$item = explode('=', $param);
if (!in_array($item[0], $kill_params)) {
$params[$item[0]] = isset($item[1]) ? $item[1] : '';
}
}
// Sort the parameter array to maximize cache hits.
ksort($params);
// Build new URL (no hosts, domains, or fragments involved).
$new_uri = '';
if ($uri_array['path']) {
$new_uri = $uri_array['path'];
}
if (count($params) > 0) {
// Wish there was a more elegant option.
$new_uri .= '?' . urldecode(http_build_query($params));
}
// Update server variable.
$_SERVER['REQUEST_URI'] = $new_uri;
}
}, $_SERVER['REQUEST_URI']);
NOTE: Updated with urldecode() to avoid double encoding via http_build_query() function.
NOTE: Updated with ksort() to allow params with no value without an error.
This one of many ways, not tested, but should work.
$link = 'http://mydomain.example/index.php?id=115&Itemid=283&return=aHR0cDovL2NvbW11bml0';
$linkParts = explode('&return=', $link);
$link = $linkParts[0];
Wow, there are a lot of examples here. I am providing one that does some error handling. It rebuilds and returns the entire URL with the query-string-param-to-be-removed, removed. It also provides a bonus function that builds the current URL on the fly. Tested, works!
Credit to Mark B for the steps. This is a complete solution to tpow's "strip off this return parameter" original question -- might be handy for beginners, trying to avoid PHP gotchas. :-)
<?php
function currenturl_without_queryparam( $queryparamkey ) {
$current_url = current_url();
$parsed_url = parse_url( $current_url );
if( array_key_exists( 'query', $parsed_url )) {
$query_portion = $parsed_url['query'];
} else {
return $current_url;
}
parse_str( $query_portion, $query_array );
if( array_key_exists( $queryparamkey , $query_array ) ) {
unset( $query_array[$queryparamkey] );
$q = ( count( $query_array ) === 0 ) ? '' : '?';
return $parsed_url['scheme'] . '://' . $parsed_url['host'] . $parsed_url['path'] . $q . http_build_query( $query_array );
} else {
return $current_url;
}
}
function current_url() {
$current_url = 'http' . (isset($_SERVER['HTTPS']) ? 's' : '') . '://' . "{$_SERVER['HTTP_HOST']}{$_SERVER['REQUEST_URI']}";
return $current_url;
}
echo currenturl_without_queryparam( 'key' );
?>
$var = preg_replace( "/return=[^&]+/", "", $var );
$var = preg_replace( "/&{2,}/", "&", $var );
Second line will just replace && to &
very simple
$link = "http://example.com/index.php?id=115&Itemid=283&return=aHR0cDovL2NvbW11bml0"
echo substr($link, 0, strpos($link, "return") - 1);
//output : http://example.com/index.php?id=115&Itemid=283
#MarcB mentioned that it is dirty to use regex to remove an url parameter. And yes it is, because it's not as easy as it looks:
$urls = array(
'example.com/?foo=bar',
'example.com/?bar=foo&foo=bar',
'example.com/?foo=bar&bar=foo',
);
echo 'Original' . PHP_EOL;
foreach ($urls as $url) {
echo $url . PHP_EOL;
}
echo PHP_EOL . '#AaronHathaway' . PHP_EOL;
foreach ($urls as $url) {
echo preg_replace('#&?foo=[^&]*#', null, $url) . PHP_EOL;
}
echo PHP_EOL . '#SergeS' . PHP_EOL;
foreach ($urls as $url) {
echo preg_replace( "/&{2,}/", "&", preg_replace( "/foo=[^&]+/", "", $url)) . PHP_EOL;
}
echo PHP_EOL . '#Justin' . PHP_EOL;
foreach ($urls as $url) {
echo preg_replace('/([?&])foo=[^&]+(&|$)/', '$1', $url) . PHP_EOL;
}
echo PHP_EOL . '#kraftb' . PHP_EOL;
foreach ($urls as $url) {
echo preg_replace('/(&|\?)foo=[^&]*&/', '$1', preg_replace('/(&|\?)foo=[^&]*$/', '', $url)) . PHP_EOL;
}
echo PHP_EOL . 'My version' . PHP_EOL;
foreach ($urls as $url) {
echo str_replace('/&', '/?', preg_replace('#[&?]foo=[^&]*#', null, $url)) . PHP_EOL;
}
returns:
Original
example.com/?foo=bar
example.com/?bar=foo&foo=bar
example.com/?foo=bar&bar=foo
#AaronHathaway
example.com/?
example.com/?bar=foo
example.com/?&bar=foo
#SergeS
example.com/?
example.com/?bar=foo&
example.com/?&bar=foo
#Justin
example.com/?
example.com/?bar=foo&
example.com/?bar=foo
#kraftb
example.com/
example.com/?bar=foo
example.com/?bar=foo
My version
example.com/
example.com/?bar=foo
example.com/?bar=foo
As you can see only #kraftb posted a correct answer using regex and my version is a little bit smaller.
Remove Get Parameters From Current Page
<?php
$url_dir=$_SERVER['REQUEST_URI'];
$url_dir_no_get_param= explode("?",$url_dir)[0];
echo $_SERVER['HTTP_HOST'].$url_dir_no_get_param;
This should do it:
public function removeQueryParam(string $url, string $param): string
{
$parsedUrl = parse_url($url);
if (isset($parsedUrl[$param])) {
$baseUrl = strtok($url, '?');
parse_str(parse_url($url)['query'], $query);
unset($query[$param]);
return sprintf('%s?%s',
$baseUrl,
http_build_query($query)
);
}
return $url;
}
Simple solution that will work for every url
With this solution $url format or parameter position doesn't matter, as an example I added another parameter and anchor at the end of $url:
https://example.com/index.php?id=115&Itemid=283&return=aHR0cDovL2NvbW11bml0&bonus=test#test2
Here is the simple solution:
$url = 'https://example.com/index.php?id=115&Itemid=283&return=aHR0cDovL2NvbW11bml0&bonus=test#test2';
$url_query_stirng = parse_url($url, PHP_URL_QUERY);
parse_str( $url_query_stirng, $url_parsed_query );
unset($url_parsed_query['return']);
$url = str_replace( $url_query_stirng, http_build_query( $url_parsed_query ), $url );
echo $url;
Final result for $url string is:
https://example.com/index.php?id=115&Itemid=283&bonus=test#test2
Some of the examples posted are so extensive. This is what I use on my projects.
function removeQueryParameter($url, $param){
list($baseUrl, $urlQuery) = explode('?', $url, 2);
parse_str($urlQuery, $urlQueryArr);
unset($urlQueryArr[$param]);
if(count($urlQueryArr))
return $baseUrl.'?'.http_build_query($urlQueryArr);
else
return $baseUrl;
}
function remove_attribute($url,$attribute)
{
$url=explode('?',$url);
$new_parameters=false;
if(isset($url[1]))
{
$params=explode('&',$url[1]);
$new_parameters=ra($params,$attribute);
}
$construct_parameters=($new_parameters && $new_parameters!='' ) ? ('?'.$new_parameters):'';
return $new_url=$url[0].$construct_parameters;
}
function ra($params,$attr)
{ $attr=$attr.'=';
$new_params=array();
for($i=0;$i<count($params);$i++)
{
$pos=strpos($params[$i],$attr);
if($pos===false)
$new_params[]=$params[$i];
}
if(count($new_params)>0)
return implode('&',$new_params);
else
return false;
}
//just copy the above code and just call this function like this to get new url without particular parameter
echo remove_attribute($url,'delete_params'); // gives new url without that parameter
I know this is an old question but if you only want to remove one or few named url parameter you can use this function:
function RemoveGet_Regex($variable, $rewritten_url): string {
$rewritten_url = preg_replace("/(\?)$/", "", preg_replace("/\?&/", "?", preg_replace("/((?<=\?)|&){$variable}=[\w]*/i", "", $rewritten_url)));
return $rewritten_url;
}
function RemoveGet($name): void {
$rewritten_url = "https://$_SERVER[HTTP_HOST]$_SERVER[REQUEST_URI]";
if(is_array($name)) {
for($i = 0; $i < count($name); $i++) {
$rewritten_url = RemoveGet_Regex($name[$i], $rewritten_url);
$is_set[] = isset($_GET[$name[$i]]);
}
$array_filtered = array_filter($is_set);
if (!empty($array_filtered)) {
header("Location: ".$rewritten_url);
}
}
else {
$rewritten_url = RemoveGet_Regex($name, $rewritten_url);
if(isset($_GET[$name])) {
header("Location: ".$rewritten_url);
}
}
}
In the first function preg_replace("/((?<=\?)|&){$variable}=[\w]*/i", "", $rewritten_url) will remove the get parameter, and the others will tidy it up. The second function will then redirect.
RemoveGet("id"); will remove the id=whatever from the url. The function can also work with arrays. For your example,
Remove(array("id","Item","return"));
To strip any parameter from the url using PHP script you need to follow this script:
function getNewArray($array,$k){
$dataArray = $array;
unset($array[$k]);
$dataArray = $array;
return $dataArray;
}
function getFullURL(){
return (isset($_SERVER['HTTPS']) && $_SERVER['HTTPS'] === 'on' ? "https" : "http") . "://$_SERVER[HTTP_HOST]$_SERVER[REQUEST_URI]";
}
$url = getFullURL();
$url_components = parse_url($url);
// Use parse_str() function to parse the
// string passed via URL
parse_str($url_components['query'], $params);
print_r($params);
<ul>
<?php foreach($params as $k=>$v){?>
<?php
$newArray = getNewArray($params,$k);
$parameters = http_build_query($newArray);
$newURL = $_SERVER['PHP_SELF']."?".$parameters;
?>
<li><?=$v;?> X
<?php }?>
</ul>
here is functions optimized for speed. But this functions DO NOT remove arrays like a[]=x&a[1]bb=y&a[2]=z by array name.
function removeQueryParam($query, $param)
{
$quoted_param = preg_quote($param, '/');
$pattern = "/(^$quoted_param=[^&]*&?)|(&$quoted_param=[^&]*)/";
return preg_replace($pattern, '', $query);
}
function removeQueryParams($query, array $params)
{
if ($params)
{
$pattern = '/';
foreach ($params as $param)
{
$quoted_param = preg_quote($param, '/');
$pattern .= "(^$quoted_param=[^&]*&?)|(&$quoted_param=[^&]*)|";
}
$pattern[-1] = '/';
return preg_replace($pattern, '', $query);
}
return $query;
}
<? if(isset($_GET['i'])){unset($_GET['i']); header('location:/');} ?>
This will remove the 'i' parameter from the URL. Change the 'i's to whatever you need.

is there a PHP library that handles URL parameters adding, removing, or replacing?

when we add a param to the URL
$redirectURL = $printPageURL . "?mode=1";
it works if $printPageURL is "http://www.somesite.com/print.php", but if $printPageURL is changed in the global file to "http://www.somesite.com/print.php?newUser=1", then the URL becomes badly formed. If the project has 300 files and there are 30 files that append param this way, we need to change all 30 files.
the same if we append using "&mode=1" and $printPageURL changes from "http://www.somesite.com/print.php?new=1" to "http://www.somesite.com/print.php", then the URL is also badly formed.
is there a library in PHP that will automatically handle the "?" and "&", and even checks that existing param exists already and removed that one because it will be replaced by the later one and it is not good if the URL keeps on growing longer?
Update: of the several helpful answers, there seems to be no pre-existing function addParam($url, $newParam) so that we don't need to write it?
Use a combination of parse_url() to explode the URL, parse_str() to explode the query string and http_build_query() to rebuild the querystring. After that you can rebuild the whole url from its original fragments you get from parse_url() and the new query string you built with http_build_query(). As the querystring gets exploded into an associative array (key-value-pairs) modifying the query is as easy as modifying an array in PHP.
EDIT
$query = parse_url('http://www.somesite.com/print.php?mode=1&newUser=1', PHP_URL_QUERY);
// $query = "mode=1&newUser=1"
$params = array();
parse_str($query, $params);
/*
* $params = array(
* 'mode' => '1'
* 'newUser' => '1'
* )
*/
unset($params['newUser']);
$params['mode'] = 2;
$params['done'] = 1;
$query = http_build_query($params);
// $query = "mode=2&done=1"
Use this:
http://hu.php.net/manual/en/function.http-build-query.php
http://www.addedbytes.com/php/querystring-functions/
is a good place to start
EDIT: There's also http://www.php.net/manual/en/class.httpquerystring.php
for example:
$http = new HttpQueryString();
$http->set(array('page' => 1, 'sort' => 'asc'));
$url = "yourfile.php" . $http->toString();
None of these solutions work when the url is of the form:
xyz.co.uk?param1=2&replace_this_param=2
param1 gets dropped all the time
.. which means it never works EVER!
If you look at the code given above:
function addParam($url, $s) {
return adjustParam($url, $s);
}
function delParam($url, $s) {
return adjustParam($url, $s);
}
These functions are IDENTICAL - so how can one add and one delete?!
using WishCow and sgehrig's suggestion, here is a test:
(assuming no anchor for the URL)
<?php
echo "<pre>\n";
function adjustParam($url, $s) {
if (preg_match('/(.*?)\?/', $url, $matches)) $urlWithoutParams = $matches[1];
else $urlWithoutParams = $url;
parse_str(parse_url($url, PHP_URL_QUERY), $params);
if (strpos($s, '=') !== false) {
list($var, $value) = split('=', $s);
$params[$var] = urldecode($value);
return $urlWithoutParams . '?' . http_build_query($params);
} else {
unset($params[$s]);
$newQueryString = http_build_query($params);
if ($newQueryString) return $urlWithoutParams . '?' . $newQueryString;
else return $urlWithoutParams;
}
}
function addParam($url, $s) {
return adjustParam($url, $s);
}
function delParam($url, $s) {
return adjustParam($url, $s);
}
echo "trying add:\n";
echo addParam("http://www.somesite.com/print.php", "mode=3"), "\n";
echo addParam("http://www.somesite.com/print.php?", "mode=3"), "\n";
echo addParam("http://www.somesite.com/print.php?newUser=1", "mode=3"), "\n";
echo addParam("http://www.somesite.com/print.php?newUser=1&fee=0", "mode=3"), "\n";
echo addParam("http://www.somesite.com/print.php?newUser=1&fee=0&", "mode=3"), "\n";
echo addParam("http://www.somesite.com/print.php?mode=1", "mode=3"), "\n";
echo "\n", "now trying delete:\n";
echo delParam("http://www.somesite.com/print.php?mode=1", "mode"), "\n";
echo delParam("http://www.somesite.com/print.php?mode=1&newUser=1", "mode"), "\n";
echo delParam("http://www.somesite.com/print.php?mode=1&newUser=1", "newUser"), "\n";
?>
and the output is:
trying add:
http://www.somesite.com/print.php?mode=3
http://www.somesite.com/print.php?mode=3
http://www.somesite.com/print.php?newUser=1&mode=3
http://www.somesite.com/print.php?newUser=1&fee=0&mode=3
http://www.somesite.com/print.php?newUser=1&fee=0&mode=3
http://www.somesite.com/print.php?mode=3
now trying delete:
http://www.somesite.com/print.php
http://www.somesite.com/print.php?newUser=1
http://www.somesite.com/print.php?mode=1
You can try this:
function removeParamFromUrl($query, $paramToRemove)
{
$params = parse_url($query);
if(isset($params['query']))
{
$queryParams = array();
parse_str($params['query'], $queryParams);
if(isset($queryParams[$paramToRemove])) unset($queryParams[$paramToRemove]);
$params['query'] = http_build_query($queryParams);
}
$ret = $params['scheme'].'://'.$params['host'].$params['path'];
if(isset($params['query']) && $params['query'] != '' ) $ret .= '?'.$params['query'];
return $ret;
}

PHP Remove URL from string

If I have a string that contains a url (for examples sake, we'll call it $url) such as;
$url = "Here is a funny site http://www.tunyurl.com/34934";
How do i remove the URL from the string?
Difficulty is, urls might also show up without the http://, such as ;
$url = "Here is another funny site www.tinyurl.com/55555";
There is no HTML present. How would i start a search if http or www exists, then remove the text/numbers/symbols until the first space?
I re-read the question, here is a function that would work as intended:
function cleaner($url) {
$U = explode(' ',$url);
$W =array();
foreach ($U as $k => $u) {
if (stristr($u,'http') || (count(explode('.',$u)) > 1)) {
unset($U[$k]);
return cleaner( implode(' ',$U));
}
}
return implode(' ',$U);
}
$url = "Here is another funny site www.tinyurl.com/55555 and http://www.tinyurl.com/55555 and img.hostingsite.com/badpic.jpg";
echo "Cleaned: " . cleaner($url);
Edit #2/#3 (I must be bored). Here is a version that verifies there is a TLD within the URL:
function containsTLD($string) {
preg_match(
"/(AC($|\/)|\.AD($|\/)|\.AE($|\/)|\.AERO($|\/)|\.AF($|\/)|\.AG($|\/)|\.AI($|\/)|\.AL($|\/)|\.AM($|\/)|\.AN($|\/)|\.AO($|\/)|\.AQ($|\/)|\.AR($|\/)|\.ARPA($|\/)|\.AS($|\/)|\.ASIA($|\/)|\.AT($|\/)|\.AU($|\/)|\.AW($|\/)|\.AX($|\/)|\.AZ($|\/)|\.BA($|\/)|\.BB($|\/)|\.BD($|\/)|\.BE($|\/)|\.BF($|\/)|\.BG($|\/)|\.BH($|\/)|\.BI($|\/)|\.BIZ($|\/)|\.BJ($|\/)|\.BM($|\/)|\.BN($|\/)|\.BO($|\/)|\.BR($|\/)|\.BS($|\/)|\.BT($|\/)|\.BV($|\/)|\.BW($|\/)|\.BY($|\/)|\.BZ($|\/)|\.CA($|\/)|\.CAT($|\/)|\.CC($|\/)|\.CD($|\/)|\.CF($|\/)|\.CG($|\/)|\.CH($|\/)|\.CI($|\/)|\.CK($|\/)|\.CL($|\/)|\.CM($|\/)|\.CN($|\/)|\.CO($|\/)|\.COM($|\/)|\.COOP($|\/)|\.CR($|\/)|\.CU($|\/)|\.CV($|\/)|\.CX($|\/)|\.CY($|\/)|\.CZ($|\/)|\.DE($|\/)|\.DJ($|\/)|\.DK($|\/)|\.DM($|\/)|\.DO($|\/)|\.DZ($|\/)|\.EC($|\/)|\.EDU($|\/)|\.EE($|\/)|\.EG($|\/)|\.ER($|\/)|\.ES($|\/)|\.ET($|\/)|\.EU($|\/)|\.FI($|\/)|\.FJ($|\/)|\.FK($|\/)|\.FM($|\/)|\.FO($|\/)|\.FR($|\/)|\.GA($|\/)|\.GB($|\/)|\.GD($|\/)|\.GE($|\/)|\.GF($|\/)|\.GG($|\/)|\.GH($|\/)|\.GI($|\/)|\.GL($|\/)|\.GM($|\/)|\.GN($|\/)|\.GOV($|\/)|\.GP($|\/)|\.GQ($|\/)|\.GR($|\/)|\.GS($|\/)|\.GT($|\/)|\.GU($|\/)|\.GW($|\/)|\.GY($|\/)|\.HK($|\/)|\.HM($|\/)|\.HN($|\/)|\.HR($|\/)|\.HT($|\/)|\.HU($|\/)|\.ID($|\/)|\.IE($|\/)|\.IL($|\/)|\.IM($|\/)|\.IN($|\/)|\.INFO($|\/)|\.INT($|\/)|\.IO($|\/)|\.IQ($|\/)|\.IR($|\/)|\.IS($|\/)|\.IT($|\/)|\.JE($|\/)|\.JM($|\/)|\.JO($|\/)|\.JOBS($|\/)|\.JP($|\/)|\.KE($|\/)|\.KG($|\/)|\.KH($|\/)|\.KI($|\/)|\.KM($|\/)|\.KN($|\/)|\.KP($|\/)|\.KR($|\/)|\.KW($|\/)|\.KY($|\/)|\.KZ($|\/)|\.LA($|\/)|\.LB($|\/)|\.LC($|\/)|\.LI($|\/)|\.LK($|\/)|\.LR($|\/)|\.LS($|\/)|\.LT($|\/)|\.LU($|\/)|\.LV($|\/)|\.LY($|\/)|\.MA($|\/)|\.MC($|\/)|\.MD($|\/)|\.ME($|\/)|\.MG($|\/)|\.MH($|\/)|\.MIL($|\/)|\.MK($|\/)|\.ML($|\/)|\.MM($|\/)|\.MN($|\/)|\.MO($|\/)|\.MOBI($|\/)|\.MP($|\/)|\.MQ($|\/)|\.MR($|\/)|\.MS($|\/)|\.MT($|\/)|\.MU($|\/)|\.MUSEUM($|\/)|\.MV($|\/)|\.MW($|\/)|\.MX($|\/)|\.MY($|\/)|\.MZ($|\/)|\.NA($|\/)|\.NAME($|\/)|\.NC($|\/)|\.NE($|\/)|\.NET($|\/)|\.NF($|\/)|\.NG($|\/)|\.NI($|\/)|\.NL($|\/)|\.NO($|\/)|\.NP($|\/)|\.NR($|\/)|\.NU($|\/)|\.NZ($|\/)|\.OM($|\/)|\.ORG($|\/)|\.PA($|\/)|\.PE($|\/)|\.PF($|\/)|\.PG($|\/)|\.PH($|\/)|\.PK($|\/)|\.PL($|\/)|\.PM($|\/)|\.PN($|\/)|\.PR($|\/)|\.PRO($|\/)|\.PS($|\/)|\.PT($|\/)|\.PW($|\/)|\.PY($|\/)|\.QA($|\/)|\.RE($|\/)|\.RO($|\/)|\.RS($|\/)|\.RU($|\/)|\.RW($|\/)|\.SA($|\/)|\.SB($|\/)|\.SC($|\/)|\.SD($|\/)|\.SE($|\/)|\.SG($|\/)|\.SH($|\/)|\.SI($|\/)|\.SJ($|\/)|\.SK($|\/)|\.SL($|\/)|\.SM($|\/)|\.SN($|\/)|\.SO($|\/)|\.SR($|\/)|\.ST($|\/)|\.SU($|\/)|\.SV($|\/)|\.SY($|\/)|\.SZ($|\/)|\.TC($|\/)|\.TD($|\/)|\.TEL($|\/)|\.TF($|\/)|\.TG($|\/)|\.TH($|\/)|\.TJ($|\/)|\.TK($|\/)|\.TL($|\/)|\.TM($|\/)|\.TN($|\/)|\.TO($|\/)|\.TP($|\/)|\.TR($|\/)|\.TRAVEL($|\/)|\.TT($|\/)|\.TV($|\/)|\.TW($|\/)|\.TZ($|\/)|\.UA($|\/)|\.UG($|\/)|\.UK($|\/)|\.US($|\/)|\.UY($|\/)|\.UZ($|\/)|\.VA($|\/)|\.VC($|\/)|\.VE($|\/)|\.VG($|\/)|\.VI($|\/)|\.VN($|\/)|\.VU($|\/)|\.WF($|\/)|\.WS($|\/)|\.XN--0ZWM56D($|\/)|\.XN--11B5BS3A9AJ6G($|\/)|\.XN--80AKHBYKNJ4F($|\/)|\.XN--9T4B11YI5A($|\/)|\.XN--DEBA0AD($|\/)|\.XN--G6W251D($|\/)|\.XN--HGBK6AJ7F53BBA($|\/)|\.XN--HLCJ6AYA9ESC7A($|\/)|\.XN--JXALPDLP($|\/)|\.XN--KGBECHTV($|\/)|\.XN--ZCKZAH($|\/)|\.YE($|\/)|\.YT($|\/)|\.YU($|\/)|\.ZA($|\/)|\.ZM($|\/)|\.ZW)/i",
$string,
$M);
$has_tld = (count($M) > 0) ? true : false;
return $has_tld;
}
function cleaner($url) {
$U = explode(' ',$url);
$W =array();
foreach ($U as $k => $u) {
if (stristr($u,".")) { //only preg_match if there is a dot
if (containsTLD($u) === true) {
unset($U[$k]);
return cleaner( implode(' ',$U));
}
}
}
return implode(' ',$U);
}
$url = "Here is another funny site badurl.badone somesite.ca/worse.jpg but this badsite.com www.tinyurl.com/55555 and http://www.tinyurl.com/55555 and img.hostingsite.com/badpic.jpg";
echo "Cleaned: " . cleaner($url);
returns:
Cleaned: Here is another funny site badurl.badone but this and and
$string = preg_replace('/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i', '', $string);
Parsing text for URLs is hard and looking for pre-existing, heavily tested code that already does this for you would be better than writing your own code and missing edge cases. For example, I would take a look at the process in Django's urlize, which wraps URLs in anchors. You could port it over to PHP, and--instead of wrapping URLs in an anchor--just delete them from the text.
thanks mike,
update a bit, it return notice error,
'/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i'
$string = preg_replace('/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i', '', $string);
$url = "Here is a funny site http://www.tunyurl.com/34934";
$replace = 'http www .com .org .net';
$with = '';
$clean_url = clean($url,$replace,$with);
echo $clean_url;
function clean($url,$replace,$with) {
$replace = explode(" ",$replace);
$new_string = '';
$check = explode(" ",$url);
foreach($check AS $key => $value) {
foreach($replace AS $key2 => $value2 ) {
if (-1 < strpos( strtolower($value), strtolower($value2) ) ) {
$value = $with;
break;
}
}
$new_string .= " ".$value;
}
return $new_string;
}
You would need to write a regular expression to extract out the urls.

Categories