How To Output User Submitted Links On Your Webpage Securely? - php

I want to allow my website visitors (any Tom, Dick & Harry) submit their links to my webpage for output on my page.
I need to parse user submitted urls before echoing their submitted urls on my page. Need to parse the urls as I won't know what urls they will be submitting nor the structures of their urls.
A user could theoretically visit my page and inject some Javascript code using, for example:
?search=<script>alert('hacked')</script>
You understand my point.
I got to write php script that when users submit their urls, then my php script parses their urls and encodes them by adding urlencode, rawurlencode, intval in the appropriate places before outputting them via htmlspecialchars.
Another wrote this following script. Problem is, it outputs like so:
http%3A%2F%2Fexample.com%2Fcat%2Fsubcat?var_1=value+1&var2=2&this_other=thing&number_is=13
It should output like this:
http://example.com/cat/subcat?var_1=value+1&var2=2&this_other=thing&number_is=13
This is their code ....
Third Party Code:
<?php
function encodedUrl($url){
$query_strings_array = [];
$query_string_parts = [];
// parse URL & get query
$scheme = parse_url($url, PHP_URL_SCHEME);
$host = parse_url($url, PHP_URL_HOST);
$path = parse_url($url, PHP_URL_PATH);
$query_strings = parse_url($url, PHP_URL_QUERY);
// parse query into array
parse_str($query_strings, $query_strings_array);
// separate keys & values
$query_strings_keys = array_keys($query_strings_array);
$query_strings_values = array_values($query_strings_array);
// loop query
for($i = 0; $i < count($query_strings_array); $i++){
$k = urlencode($query_strings_keys[$i]);
$v = $query_strings_values[$i];
$val = is_numeric($v) ? intval($v) : urlencode($v);
$query_string_parts[] = "{$k}={$val}";
}
// re-assemble URL
$encodedHostPath = rawurlencode("{$scheme}://{$host}{$path}");
return $encodedHostPath . '?' . implode('&', $query_string_parts);
}
$url1 = 'http://example.com/cat/subcat?var 1=value 1&var2=2&this other=thing&number is=13';
$url2 = 'http://example.com/autos/cars/list.php?state=california&max_price=50000';
// run urls thru function & echo
// run urls thru function & echo
echo $encoded_url1 = encodedUrl($url1); echo '<br>';
echo $encoded_url2 = encodedUrl($url2); echo '<br>';
?>
So, I changed this of their's:
$encodedHostPath = rawurlencode("{$scheme}://{$host}{$path}");
to this of mine (my amendment):
$encodedHostPath = rawurlencode("{$scheme}").'://'.rawurlencode("{$host}").$path;
And it seems to be working. As it's outputting:
http://example.com/cat/subcat?var_1=value+1&var2=2&this_other=thing&number_is=13
QUESTION 1:
But I am not sure if I put the raw_urlencode() in the right places or not and so best you check.
Also, should not the $path be inside raw_urlencode like so ?
raw_urlencode($path)
Note however that:
raw_urlencode($path)
doesn't output right.
QUESTION 2:
I FURTHER updated their code to a new VERSION and it's not outputting right. Why is that ? Where am I going wrong ?
All I did was add a few lines.
This is my update (NEW VERSION) which outputs wrong. Outputs like this:
http%3A%2F%2Fexample.com%2Fcat%2Fsubcat?var_1=value+1&var2=2&this_other=thing&number_is=13
I added a few lines of my own at the bottom of their code.
MY UPDATE (NEW VERSION):
<?php
function encodedUrledited($url){
$query_strings_array = [];
$query_string_parts = [];
// parse URL & get query
$scheme = parse_url($url, PHP_URL_SCHEME);
$host = parse_url($url, PHP_URL_HOST);
$path = parse_url($url, PHP_URL_PATH);
$query_strings = parse_url($url, PHP_URL_QUERY);
// parse query into array
parse_str($query_strings, $query_strings_array);
// separate keys & values
$query_strings_keys = array_keys($query_strings_array);
$query_strings_values = array_values($query_strings_array);
// loop query
for($i = 0; $i < count($query_strings_array); $i++){
$k = urlencode($query_strings_keys[$i]);
$v = $query_strings_values[$i];
$val = is_numeric($v) ? intval($v) : urlencode($v);
$query_string_parts[] = "{$k}={$val}";
}
// re-assemble URL
$encodedHostPath = rawurlencode("{$scheme}").'://'.rawurlencode("{$host}").$path;
return $encodedHostPath . '?' .implode('&', $query_string_parts);
}
if(!ISSET($_POST['url1']) && empty($_POST['url1']) && !ISSET($_POST['url2']) && empty($_POST['url2']))
{
//Default Values for Substituting empty User Inputs.
$url1 = 'http://example.com/cat/subcat?var 1=value 1&var2=2&this other=thing&number is=138';
$url2 = 'http://example.com/autos/cars/list.php?state=california&max_price=500008';
}
else
{
//User has made following inputs...
$url1 = $_POST['url1'];
$url2 = $_POST['url2'];
//Encode User's Url inputs. (Add rawurlencode(), urlencode() and intval() in user's submitted url where appropriate).
$encoded_url1 = encodedUrledited($url1);
$encoded_url2 = encodedUrledited($url2);
}
echo $link1 = '<a href=' .htmlspecialchars($encoded_url1) .'>' .htmlspecialchars($encoded_url1) .'</a>';
echo '<br/>';
echo $link2 = '<a href=' .htmlspecialchars($encoded_url2) .'>' .htmlspecialchars($encoded_url2) . '</a>';
echo '<br>';
?>
This thread is really about the 2nd code. My update.
Thank You!

I fixed my code.
Answering my own question.
Fixed Code:
function encodedUrledited($url){
$query_strings_array = [];
$query_string_parts = [];
// parse URL & get query
$scheme = parse_url($url, PHP_URL_SCHEME);
$host = parse_url($url, PHP_URL_HOST);
$path = parse_url($url, PHP_URL_PATH);
$query_strings = parse_url($url, PHP_URL_QUERY);
// parse query into array
parse_str($query_strings, $query_strings_array);
// separate keys & values
$query_strings_keys = array_keys($query_strings_array);
$query_strings_values = array_values($query_strings_array);
// loop query
for($i = 0; $i < count($query_strings_array); $i++){
$k = $query_strings_keys[$i];
$key = is_numeric($k) ? intval($k) : urlencode($k);
$v = $query_strings_values[$i];
$val = is_numeric($v) ? intval($v) : urlencode($v);
$query_string_parts[] = "{$key}={$val}";
}
// re-assemble URL
$encodedHostPath = rawurlencode($scheme).'://'.rawurlencode($host).$path;
$encodedHostPath .= '?' .implode('&', $query_string_parts);
return $encodedHostPath;
}
if(!ISSET($_POST['url1']) && empty($_POST['url1']) && !ISSET($_POST['url2']) && empty($_POST['url2']))
{
//Default Values for Substituting empty User Inputs.
$url1 = 'http://example.com/cat/subcat?var 1=value 1&var2=2&this other=thing&number is=138';
$url2 = 'http://example.com/autos/cars/list.php?state=california&max_price=500008';
}
else
{
//User has made following inputs...
$url1 = $_POST['url1'];
$url2 = $_POST['url2'];
//Encode User's Url inputs. (Add rawurlencode(), urlencode() and intval() in user's submitted url where appropriate).
}
$encoded_url1 = encodedUrledited($url1);
$encoded_url2 = encodedUrledited($url2);
$link1 = '<a href=' .htmlspecialchars($encoded_url1) .'>' .htmlspecialchars($encoded_url1) .'</a>';
$link2 = '<a href=' .htmlspecialchars($encoded_url2) .'>' .htmlspecialchars($encoded_url2) . '</a>';
echo $link1; echo '<br/>';
echo $link2; echo '<br/>';
?>
These 2 following lines were supposed to be outside the ELSE. They weren't. Hence all the issue. Moved them outside the ELSE and now script working fine.
$encoded_url1 = encodedUrledited($url1);
$encoded_url2 = encodedUrledited($url2);

Related

Check every URL in string to remove links of certain sites

I want to remove URLs of certain sites within a string
I used this:
<?php
$URLContent = '<p>Google</p><p>AnotherSite</p>';
$LinksToRemove = array('google.com', 'yahoo.com', 'msn.com');
$LinksToCheck = in_array('google.com' , $LinksToRemove);
if (strpos($URLContent, $LinksToCheck) !== 0) {
$URLContent = preg_replace('#<a.*?>([^>]*)</a>#i', '$1', $URLContent);
}
echo $URLContent;
?>
In this example, I want to remove URLs of google.com, yahoo.com and msn.com websites only if any of them found in string $URLContent, but keep any other links.
The result of the previous code is:
<p>Google</p><p>AnotherSite</p>
but I want it to be:
<p>Google</p><p>AnotherSite</p>
One solution would be to explode your $URLContent and compare for each value in $LinksToCheck.
It could be like this :
<?php
$URLContent = '<p>Google</p><p>AnotherSite</p>';
$urlList = explode('</p>', $URLContent);
$LinksToRemove = array('google.com', 'yahoo.com', 'msn.com');
$urlFormat = [];
foreach ($urlList as $url) {
foreach ($LinksToRemove as $link) {
if (str_contains($url, $link)) {
$url = '<p>' . ucfirst(str_replace('.com', '', $link)) . '</p>';
break;
}
}
$urlFormat[] = $url;
}
$result = implode('', $urlFormat);

PHP - Search array in array

I have tried googling for the past one hour straight now and tried many ways to search for an array, in an array.
My objective is, to find a keyword in the URL, and the keywords are in a txt file.
This is what i have so far - but doesn't work.
$file = "keywords.txt";
$open = fopen($file,'r');
$data = fread($open,filesize($file));
$data = explode(" ",$data);
$url = (!empty($_SERVER['HTTPS'])) ? "https://".$_SERVER['SERVER_NAME'].$_SERVER['REQUEST_URI'] : "http://".$_SERVER['SERVER_NAME'].$_SERVER['REQUEST_URI'];
$url = parse_url($url); //parse the URL into an array
foreach($data as $d)
{
if(strstr($d,$url))
{
echo "yes";
}
}
This works WITHOUT the text file, or array - but that's not what i want.
I'd appreciate it if anyone can assist me.
This is the way I'd do it:
$file = "keywords.txt";
$open = fopen($file,'r');
$data = fread($open,filesize($file));
$data = explode(" ",$data);
$url = (!empty($_SERVER['HTTPS'])) ? "https://".$_SERVER['SERVER_NAME'].$_SERVER['REQUEST_URI'] : "http://".$_SERVER['SERVER_NAME'].$_SERVER['REQUEST_URI'];
$url = parse_url($url); //parse the URL into an array
foreach($data as $d){
if(in_array($d,$url)){
echo "yes";
}
}

Strip off specific parameter from URL's querystring

I have some links in a Powerpoint presentation, and for some reason, when those links get clicked, it adds a return parameter to the URL. Well, that return parameter is causing my Joomla site's MVC pattern to get bungled.
What's an efficient way to strip off this return parameter using PHP?
Example:
http://mydomain.example/index.php?id=115&Itemid=283&return=aHR0cDovL2NvbW11bml0
The safest "correct" method would be:
Parse the url into an array with parse_url()
Extract the query portion, decompose that into an array using parse_str()
Delete the query parameters you want by unset() them from the array
Rebuild the original url using http_build_query()
Quick and dirty is to use a string search/replace and/or regex to kill off the value.
In a different thread Justin suggests that the fastest way is to use strtok()
$url = strtok($url, '?');
See his full answer with speed tests as well here: https://stackoverflow.com/a/1251650/452515
This is to complement Marc B's answer with an example, while it may look quite long, it's a safe way to remove a parameter. In this example we remove page_number
<?php
$x = 'http://url.example/search/?location=london&page_number=1';
$parsed = parse_url($x);
$query = $parsed['query'];
parse_str($query, $params);
unset($params['page_number']);
$string = http_build_query($params);
var_dump($string);
function removeParam($url, $param) {
$url = preg_replace('/(&|\?)'.preg_quote($param).'=[^&]*$/', '', $url);
$url = preg_replace('/(&|\?)'.preg_quote($param).'=[^&]*&/', '$1', $url);
return $url;
}
parse_str($queryString, $vars);
unset($vars['return']);
$queryString = http_build_query($vars);
parse_str parses a query string, http_build_query creates a query string.
Procedural Implementation of Marc B's Answer after refining Sergey Telshevsky's Answer.
function strip_param_from_url($url, $param)
{
$base_url = strtok($url, '?'); // Get the base URL
$parsed_url = parse_url($url); // Parse it
// Add missing {
if(array_key_exists('query',$parsed_url)) { // Only execute if there are parameters
$query = $parsed_url['query']; // Get the query string
parse_str($query, $parameters); // Convert Parameters into array
unset($parameters[$param]); // Delete the one you want
$new_query = http_build_query($parameters); // Rebuilt query string
$url =$base_url.'?'.$new_query; // Finally URL is ready
}
return $url;
}
// Usage
echo strip_param_from_url( 'http://url.example/search/?location=london&page_number=1', 'location' )
You could do a preg_replace like:
$new_url = preg_replace('/&?return=[^&]*/', '', $old_url);
Here is the actual code for what's described above as the "the safest 'correct' method"...
function reduce_query($uri = '') {
$kill_params = array('gclid');
$uri_array = parse_url($uri);
if (isset($uri_array['query'])) {
// Do the chopping.
$params = array();
foreach (explode('&', $uri_array['query']) as $param) {
$item = explode('=', $param);
if (!in_array($item[0], $kill_params)) {
$params[$item[0]] = isset($item[1]) ? $item[1] : '';
}
}
// Sort the parameter array to maximize cache hits.
ksort($params);
// Build new URL (no hosts, domains, or fragments involved).
$new_uri = '';
if ($uri_array['path']) {
$new_uri = $uri_array['path'];
}
if (count($params) > 0) {
// Wish there was a more elegant option.
$new_uri .= '?' . urldecode(http_build_query($params));
}
return $new_uri;
}
return $uri;
}
$_SERVER['REQUEST_URI'] = reduce_query($_SERVER['REQUEST_URI']);
However, since this will likely exist prior to the bootstrap of your application, you should probably put it into an anonymous function. Like this...
call_user_func(function($uri) {
$kill_params = array('gclid');
$uri_array = parse_url($uri);
if (isset($uri_array['query'])) {
// Do the chopping.
$params = array();
foreach (explode('&', $uri_array['query']) as $param) {
$item = explode('=', $param);
if (!in_array($item[0], $kill_params)) {
$params[$item[0]] = isset($item[1]) ? $item[1] : '';
}
}
// Sort the parameter array to maximize cache hits.
ksort($params);
// Build new URL (no hosts, domains, or fragments involved).
$new_uri = '';
if ($uri_array['path']) {
$new_uri = $uri_array['path'];
}
if (count($params) > 0) {
// Wish there was a more elegant option.
$new_uri .= '?' . urldecode(http_build_query($params));
}
// Update server variable.
$_SERVER['REQUEST_URI'] = $new_uri;
}
}, $_SERVER['REQUEST_URI']);
NOTE: Updated with urldecode() to avoid double encoding via http_build_query() function.
NOTE: Updated with ksort() to allow params with no value without an error.
This one of many ways, not tested, but should work.
$link = 'http://mydomain.example/index.php?id=115&Itemid=283&return=aHR0cDovL2NvbW11bml0';
$linkParts = explode('&return=', $link);
$link = $linkParts[0];
Wow, there are a lot of examples here. I am providing one that does some error handling. It rebuilds and returns the entire URL with the query-string-param-to-be-removed, removed. It also provides a bonus function that builds the current URL on the fly. Tested, works!
Credit to Mark B for the steps. This is a complete solution to tpow's "strip off this return parameter" original question -- might be handy for beginners, trying to avoid PHP gotchas. :-)
<?php
function currenturl_without_queryparam( $queryparamkey ) {
$current_url = current_url();
$parsed_url = parse_url( $current_url );
if( array_key_exists( 'query', $parsed_url )) {
$query_portion = $parsed_url['query'];
} else {
return $current_url;
}
parse_str( $query_portion, $query_array );
if( array_key_exists( $queryparamkey , $query_array ) ) {
unset( $query_array[$queryparamkey] );
$q = ( count( $query_array ) === 0 ) ? '' : '?';
return $parsed_url['scheme'] . '://' . $parsed_url['host'] . $parsed_url['path'] . $q . http_build_query( $query_array );
} else {
return $current_url;
}
}
function current_url() {
$current_url = 'http' . (isset($_SERVER['HTTPS']) ? 's' : '') . '://' . "{$_SERVER['HTTP_HOST']}{$_SERVER['REQUEST_URI']}";
return $current_url;
}
echo currenturl_without_queryparam( 'key' );
?>
$var = preg_replace( "/return=[^&]+/", "", $var );
$var = preg_replace( "/&{2,}/", "&", $var );
Second line will just replace && to &
very simple
$link = "http://example.com/index.php?id=115&Itemid=283&return=aHR0cDovL2NvbW11bml0"
echo substr($link, 0, strpos($link, "return") - 1);
//output : http://example.com/index.php?id=115&Itemid=283
#MarcB mentioned that it is dirty to use regex to remove an url parameter. And yes it is, because it's not as easy as it looks:
$urls = array(
'example.com/?foo=bar',
'example.com/?bar=foo&foo=bar',
'example.com/?foo=bar&bar=foo',
);
echo 'Original' . PHP_EOL;
foreach ($urls as $url) {
echo $url . PHP_EOL;
}
echo PHP_EOL . '#AaronHathaway' . PHP_EOL;
foreach ($urls as $url) {
echo preg_replace('#&?foo=[^&]*#', null, $url) . PHP_EOL;
}
echo PHP_EOL . '#SergeS' . PHP_EOL;
foreach ($urls as $url) {
echo preg_replace( "/&{2,}/", "&", preg_replace( "/foo=[^&]+/", "", $url)) . PHP_EOL;
}
echo PHP_EOL . '#Justin' . PHP_EOL;
foreach ($urls as $url) {
echo preg_replace('/([?&])foo=[^&]+(&|$)/', '$1', $url) . PHP_EOL;
}
echo PHP_EOL . '#kraftb' . PHP_EOL;
foreach ($urls as $url) {
echo preg_replace('/(&|\?)foo=[^&]*&/', '$1', preg_replace('/(&|\?)foo=[^&]*$/', '', $url)) . PHP_EOL;
}
echo PHP_EOL . 'My version' . PHP_EOL;
foreach ($urls as $url) {
echo str_replace('/&', '/?', preg_replace('#[&?]foo=[^&]*#', null, $url)) . PHP_EOL;
}
returns:
Original
example.com/?foo=bar
example.com/?bar=foo&foo=bar
example.com/?foo=bar&bar=foo
#AaronHathaway
example.com/?
example.com/?bar=foo
example.com/?&bar=foo
#SergeS
example.com/?
example.com/?bar=foo&
example.com/?&bar=foo
#Justin
example.com/?
example.com/?bar=foo&
example.com/?bar=foo
#kraftb
example.com/
example.com/?bar=foo
example.com/?bar=foo
My version
example.com/
example.com/?bar=foo
example.com/?bar=foo
As you can see only #kraftb posted a correct answer using regex and my version is a little bit smaller.
Remove Get Parameters From Current Page
<?php
$url_dir=$_SERVER['REQUEST_URI'];
$url_dir_no_get_param= explode("?",$url_dir)[0];
echo $_SERVER['HTTP_HOST'].$url_dir_no_get_param;
This should do it:
public function removeQueryParam(string $url, string $param): string
{
$parsedUrl = parse_url($url);
if (isset($parsedUrl[$param])) {
$baseUrl = strtok($url, '?');
parse_str(parse_url($url)['query'], $query);
unset($query[$param]);
return sprintf('%s?%s',
$baseUrl,
http_build_query($query)
);
}
return $url;
}
Simple solution that will work for every url
With this solution $url format or parameter position doesn't matter, as an example I added another parameter and anchor at the end of $url:
https://example.com/index.php?id=115&Itemid=283&return=aHR0cDovL2NvbW11bml0&bonus=test#test2
Here is the simple solution:
$url = 'https://example.com/index.php?id=115&Itemid=283&return=aHR0cDovL2NvbW11bml0&bonus=test#test2';
$url_query_stirng = parse_url($url, PHP_URL_QUERY);
parse_str( $url_query_stirng, $url_parsed_query );
unset($url_parsed_query['return']);
$url = str_replace( $url_query_stirng, http_build_query( $url_parsed_query ), $url );
echo $url;
Final result for $url string is:
https://example.com/index.php?id=115&Itemid=283&bonus=test#test2
Some of the examples posted are so extensive. This is what I use on my projects.
function removeQueryParameter($url, $param){
list($baseUrl, $urlQuery) = explode('?', $url, 2);
parse_str($urlQuery, $urlQueryArr);
unset($urlQueryArr[$param]);
if(count($urlQueryArr))
return $baseUrl.'?'.http_build_query($urlQueryArr);
else
return $baseUrl;
}
function remove_attribute($url,$attribute)
{
$url=explode('?',$url);
$new_parameters=false;
if(isset($url[1]))
{
$params=explode('&',$url[1]);
$new_parameters=ra($params,$attribute);
}
$construct_parameters=($new_parameters && $new_parameters!='' ) ? ('?'.$new_parameters):'';
return $new_url=$url[0].$construct_parameters;
}
function ra($params,$attr)
{ $attr=$attr.'=';
$new_params=array();
for($i=0;$i<count($params);$i++)
{
$pos=strpos($params[$i],$attr);
if($pos===false)
$new_params[]=$params[$i];
}
if(count($new_params)>0)
return implode('&',$new_params);
else
return false;
}
//just copy the above code and just call this function like this to get new url without particular parameter
echo remove_attribute($url,'delete_params'); // gives new url without that parameter
I know this is an old question but if you only want to remove one or few named url parameter you can use this function:
function RemoveGet_Regex($variable, $rewritten_url): string {
$rewritten_url = preg_replace("/(\?)$/", "", preg_replace("/\?&/", "?", preg_replace("/((?<=\?)|&){$variable}=[\w]*/i", "", $rewritten_url)));
return $rewritten_url;
}
function RemoveGet($name): void {
$rewritten_url = "https://$_SERVER[HTTP_HOST]$_SERVER[REQUEST_URI]";
if(is_array($name)) {
for($i = 0; $i < count($name); $i++) {
$rewritten_url = RemoveGet_Regex($name[$i], $rewritten_url);
$is_set[] = isset($_GET[$name[$i]]);
}
$array_filtered = array_filter($is_set);
if (!empty($array_filtered)) {
header("Location: ".$rewritten_url);
}
}
else {
$rewritten_url = RemoveGet_Regex($name, $rewritten_url);
if(isset($_GET[$name])) {
header("Location: ".$rewritten_url);
}
}
}
In the first function preg_replace("/((?<=\?)|&){$variable}=[\w]*/i", "", $rewritten_url) will remove the get parameter, and the others will tidy it up. The second function will then redirect.
RemoveGet("id"); will remove the id=whatever from the url. The function can also work with arrays. For your example,
Remove(array("id","Item","return"));
To strip any parameter from the url using PHP script you need to follow this script:
function getNewArray($array,$k){
$dataArray = $array;
unset($array[$k]);
$dataArray = $array;
return $dataArray;
}
function getFullURL(){
return (isset($_SERVER['HTTPS']) && $_SERVER['HTTPS'] === 'on' ? "https" : "http") . "://$_SERVER[HTTP_HOST]$_SERVER[REQUEST_URI]";
}
$url = getFullURL();
$url_components = parse_url($url);
// Use parse_str() function to parse the
// string passed via URL
parse_str($url_components['query'], $params);
print_r($params);
<ul>
<?php foreach($params as $k=>$v){?>
<?php
$newArray = getNewArray($params,$k);
$parameters = http_build_query($newArray);
$newURL = $_SERVER['PHP_SELF']."?".$parameters;
?>
<li><?=$v;?> X
<?php }?>
</ul>
here is functions optimized for speed. But this functions DO NOT remove arrays like a[]=x&a[1]bb=y&a[2]=z by array name.
function removeQueryParam($query, $param)
{
$quoted_param = preg_quote($param, '/');
$pattern = "/(^$quoted_param=[^&]*&?)|(&$quoted_param=[^&]*)/";
return preg_replace($pattern, '', $query);
}
function removeQueryParams($query, array $params)
{
if ($params)
{
$pattern = '/';
foreach ($params as $param)
{
$quoted_param = preg_quote($param, '/');
$pattern .= "(^$quoted_param=[^&]*&?)|(&$quoted_param=[^&]*)|";
}
$pattern[-1] = '/';
return preg_replace($pattern, '', $query);
}
return $query;
}
<? if(isset($_GET['i'])){unset($_GET['i']); header('location:/');} ?>
This will remove the 'i' parameter from the URL. Change the 'i's to whatever you need.

Strip array of url and other characters, show only post name

The xml is like this: (wordpress url's) I want to strip them and get only the posts words.
http://www.site1.com/dir/this-is-page/
http://www.site2.com/this-is-page
How do i strip the url's and get only "this is page" (without the rest of the urls, and the "-") if i have two diffrent types of urls; one with dir and one without dir? Sample code bellow:
$feeds = array('http://www.site1.com/dir/feed.xml', 'http://www.site2.com/feed.xml');
foreach($feeds as $feed)
{
$xml = simplexml_load_file($feed);
foreach( $xml->url as $url )
{
$loc = $url->loc;
echo $loc;
$locstrip = explode("/",$loc);
$locstripped = $locstrip[4];
echo '<br />';
echo $locstripped;
echo '<br />';
mysql_query("TRUNCATE TABLE interlinks");
mysql_query("INSERT INTO interlinks (title, url) VALUES ('$locstripped', '$loc')");
}
}
?>
TY
Ty guys, did it like this:
$urlstrip = basename($loc);
$linestrip = str_replace(array('-','_'), ' ', $urlstrip);
You want only the last segment of the URL?
Try something like this.
$url = trim('http://www.site1.com/dir/this-is-page/', '/');
$url = explode('/', $url);
$url = array_pop($url);
$url = str_replace(array('-','_'), ' ', $url);
It's not very elegant... but it works.
replace
$locstripped = $locstrip[4];
with
$locstripped = $locstrip[count($loc) - 1];
if(!$locstripped)
$locstripped = $locstrip[count($loc) - 2];
$locstripped = str_replace('-', ' ', $locstripped);

is there a PHP library that handles URL parameters adding, removing, or replacing?

when we add a param to the URL
$redirectURL = $printPageURL . "?mode=1";
it works if $printPageURL is "http://www.somesite.com/print.php", but if $printPageURL is changed in the global file to "http://www.somesite.com/print.php?newUser=1", then the URL becomes badly formed. If the project has 300 files and there are 30 files that append param this way, we need to change all 30 files.
the same if we append using "&mode=1" and $printPageURL changes from "http://www.somesite.com/print.php?new=1" to "http://www.somesite.com/print.php", then the URL is also badly formed.
is there a library in PHP that will automatically handle the "?" and "&", and even checks that existing param exists already and removed that one because it will be replaced by the later one and it is not good if the URL keeps on growing longer?
Update: of the several helpful answers, there seems to be no pre-existing function addParam($url, $newParam) so that we don't need to write it?
Use a combination of parse_url() to explode the URL, parse_str() to explode the query string and http_build_query() to rebuild the querystring. After that you can rebuild the whole url from its original fragments you get from parse_url() and the new query string you built with http_build_query(). As the querystring gets exploded into an associative array (key-value-pairs) modifying the query is as easy as modifying an array in PHP.
EDIT
$query = parse_url('http://www.somesite.com/print.php?mode=1&newUser=1', PHP_URL_QUERY);
// $query = "mode=1&newUser=1"
$params = array();
parse_str($query, $params);
/*
* $params = array(
* 'mode' => '1'
* 'newUser' => '1'
* )
*/
unset($params['newUser']);
$params['mode'] = 2;
$params['done'] = 1;
$query = http_build_query($params);
// $query = "mode=2&done=1"
Use this:
http://hu.php.net/manual/en/function.http-build-query.php
http://www.addedbytes.com/php/querystring-functions/
is a good place to start
EDIT: There's also http://www.php.net/manual/en/class.httpquerystring.php
for example:
$http = new HttpQueryString();
$http->set(array('page' => 1, 'sort' => 'asc'));
$url = "yourfile.php" . $http->toString();
None of these solutions work when the url is of the form:
xyz.co.uk?param1=2&replace_this_param=2
param1 gets dropped all the time
.. which means it never works EVER!
If you look at the code given above:
function addParam($url, $s) {
return adjustParam($url, $s);
}
function delParam($url, $s) {
return adjustParam($url, $s);
}
These functions are IDENTICAL - so how can one add and one delete?!
using WishCow and sgehrig's suggestion, here is a test:
(assuming no anchor for the URL)
<?php
echo "<pre>\n";
function adjustParam($url, $s) {
if (preg_match('/(.*?)\?/', $url, $matches)) $urlWithoutParams = $matches[1];
else $urlWithoutParams = $url;
parse_str(parse_url($url, PHP_URL_QUERY), $params);
if (strpos($s, '=') !== false) {
list($var, $value) = split('=', $s);
$params[$var] = urldecode($value);
return $urlWithoutParams . '?' . http_build_query($params);
} else {
unset($params[$s]);
$newQueryString = http_build_query($params);
if ($newQueryString) return $urlWithoutParams . '?' . $newQueryString;
else return $urlWithoutParams;
}
}
function addParam($url, $s) {
return adjustParam($url, $s);
}
function delParam($url, $s) {
return adjustParam($url, $s);
}
echo "trying add:\n";
echo addParam("http://www.somesite.com/print.php", "mode=3"), "\n";
echo addParam("http://www.somesite.com/print.php?", "mode=3"), "\n";
echo addParam("http://www.somesite.com/print.php?newUser=1", "mode=3"), "\n";
echo addParam("http://www.somesite.com/print.php?newUser=1&fee=0", "mode=3"), "\n";
echo addParam("http://www.somesite.com/print.php?newUser=1&fee=0&", "mode=3"), "\n";
echo addParam("http://www.somesite.com/print.php?mode=1", "mode=3"), "\n";
echo "\n", "now trying delete:\n";
echo delParam("http://www.somesite.com/print.php?mode=1", "mode"), "\n";
echo delParam("http://www.somesite.com/print.php?mode=1&newUser=1", "mode"), "\n";
echo delParam("http://www.somesite.com/print.php?mode=1&newUser=1", "newUser"), "\n";
?>
and the output is:
trying add:
http://www.somesite.com/print.php?mode=3
http://www.somesite.com/print.php?mode=3
http://www.somesite.com/print.php?newUser=1&mode=3
http://www.somesite.com/print.php?newUser=1&fee=0&mode=3
http://www.somesite.com/print.php?newUser=1&fee=0&mode=3
http://www.somesite.com/print.php?mode=3
now trying delete:
http://www.somesite.com/print.php
http://www.somesite.com/print.php?newUser=1
http://www.somesite.com/print.php?mode=1
You can try this:
function removeParamFromUrl($query, $paramToRemove)
{
$params = parse_url($query);
if(isset($params['query']))
{
$queryParams = array();
parse_str($params['query'], $queryParams);
if(isset($queryParams[$paramToRemove])) unset($queryParams[$paramToRemove]);
$params['query'] = http_build_query($queryParams);
}
$ret = $params['scheme'].'://'.$params['host'].$params['path'];
if(isset($params['query']) && $params['query'] != '' ) $ret .= '?'.$params['query'];
return $ret;
}

Categories