PHP normalize remote url's [duplicate] - php

This question already has an answer here:
How do I apply URL normalization rules in PHP?
(1 answer)
Closed 9 years ago.
Is there any quick function that will convert: HtTp://www.ExAmPle.com/blah to http://www.example.com/blah
Basically I want to lower case the case-insensitive parts of a url.

No, you'll have to write code for it on your own.
But you can use parse_url() to split the URL into its parts.

Since you asked for "quick," here's a one-liner that does the job:
$url = 'HtTp://User:Pass#www.ExAmPle.com:80/Blah';
echo preg_replace_callback(
'#(^[a-z]+://)(.+#)?([^/]+)(.*)$#i',
create_function('$m',
'return strtolower($m[1]).$m[2].strtolower($m[3]).$m[4];'),
$url);
Outputs:
http://User:Pass#www.example.com:80/Blah
EDIT/ADD:
I've tested, and this version is about 55% faster than using preg_replace_callback with an anonymous function:
echo preg_replace(
'#(^[a-z]+://)(.+#)?([^/]+)(.*)$#ei',
"strtolower('\\1').'\\2'.strtolower('\\3').'\\4'",
$url);

I believe this class will do what you're looking for http://www.glenscott.co.uk/blog/2011/01/09/normalize-urls-with-php/

Here's a solution, expanding on what #ThiefMaster already mentioned:
DEMO
function urltolower($url){
if (($_url = parse_url($url)) !== false){ // valid url
$newUrl = strtolower($_url['scheme']) . "://";
if ($_url['user'] && $_url['pass'])
$newUrl .= $_url['user'] . ":" . $_url['pass'] . "#";
$newUrl .= strtolower($_url['host']) . $_url['path'];
if ($_url['query'])
$newUrl .= "?" . $_url['query'];
if ($_url['fragment'])
$newUrl .= "#" . $_url['fragment'];
return $newUrl;
}
return $url; // could return false if you'd like
}
Note: Not battle-tested but it should get you going.

Related

get main url part of big link [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
Get part of url in php
i want to get http://aoup.net/manage/preForm/test.php
from this url
http://aoup.net/manage/preForm/test.php?op=Results&form_id=1&form_name=%D8%A7%D9%86%D8%AA%D8%AE%D8%A7%D8%A8%20%D8%AD%D9%88%D8%B2%D9%87%20%D8%A7%D9%85%D8%AA%D8%AD%D8%A7%D9%86%DB%8C%28%D8%AF%D9%88%D8%B1%D9%87%20228%29&hash=406ce38266577b8dff3102e476fdf587
this my php code not work correctly:
echo 'http://'.$_SERVER['HTTP_HOST'].dirname($_SERVER['PHP_SELF']);
You can use parse_url function to achieve this: http://es2.php.net/manual/en/function.parse-url.php
For example:
<?php
$url = 'http://aoup.net/manage/preForm/test.php?op=Results&form_id=1&form_name=%D8%A7%D9%86%D8%AA%D8%AE%D8%A7%D8%A8%20%D8%AD%D9%88%D8%B2%D9%87%20%D8%A7%D9%85%D8%AA%D8%AD%D8%A7%D9%86%DB%8C%28%D8%AF%D9%88%D8%B1%D9%87%20228%29&hash=406ce38266577b8dff3102e476fdf587';
$parsed_url = parse_url($url);
$new_url = $parsed_url['scheme'] . '://' . $parsed_url['host'] . $parsed_url['path'];
echo $new_url;
Will print http://aoup.net/manage/preForm/test.php
echo $_SERVER['HTTP_HOST'] . $_SERVER['SCRIPT_NAME'];

PHP: Adding parameters to a url?

If I have the url mysite.com/test.php?id=1. The id is set when the page loads and can be anything. There could also be others in there such as ?id=1&sort=new. Is there a way just to add another to the end without finding out what the others are first then building a new url? thanks.
As an alternative to Kolink's answer, I think I would utilize http_build_query(). This way, if there is nothing in the query string, you don't get an extra &. Although, it won't really make a difference at all. Kolink's answer is perfectly fine. I'm posting this mainly to introduce you to http_build_query(), as you will likely need it later:
http_build_query(array_merge($_GET, array('newvar'=>'123')))
Basically, we use http_build_query() to take everything in $_GET, and merge it with an array of any other parameters we want. In this example, I just create an array on the fly, using your example parameter. In practice, you'll likely have an array like this somewhere already.
"?".$_SERVER['QUERY_STRING']."&newvar=123";
Something like that.
Use this function: https://github.com/patrykparcheta/misc/blob/master/addQueryArgs.php
function addQueryArgs(array $args, string $url)
{
if (filter_var($url, FILTER_VALIDATE_URL)) {
$urlParts = parse_url($url);
if (isset($urlParts['query'])) {
parse_str($urlParts['query'], $urlQueryArgs);
$urlParts['query'] = http_build_query(array_merge($urlQueryArgs, $args));
$newUrl = $urlParts['scheme'] . '://' . $urlParts['host'] . $urlParts['path'] . '?' . $urlParts['query'];
} else {
$newUrl = $url . '?' . http_build_query($args);
}
return $newUrl;
} else {
return $url;
}
}
$newUrl = addQueryArgs(array('add' => 'this', 'and' => 'this'), 'http://example.com/?have=others');

native php function to highlight javascript?

Is there any native PHP function as highlight_string(); but for javascript ?
Or, if not, is there any PHP function (homemade) to do it?
EDIT: I want to use PHP function to COLORIZE javascript
I have had great success with GeSHi. Easy to use and integrate in your app and it supports a lot of languages.
I understand you want a Syntax Highligher written in PHP. This one (Geshi) has worked for me in the past:
http://qbnz.com/highlighter/
Yes, the PHP function highlight_string() is a native PHP function for PHP.
No.
But there are a lot of javascript libraries that do syntax-highlight on several languages,
from bash-scripting to php and javascript.
eg, like snippet (JQuery) or jQuery.Syntax (my favorite)
Over here you can find an excellent library which enables syntax highlighting in a large amount of languages using javascripts and a css class.
There is no native php function to do this, so either you have to use existing libraries or you have to write something yourself.
Fastest way - you can use also PHP function "highlight_string" with a little trick
(capture function output and remove leading/trailing PHP tags):
$source = '... some javascript ...';
// option 1 - pure JS code
$htmlJs = highlight_string('<?php '.$source.' ?>', true);
$htmlJs = str_replace(array('<?php ', ' ?>'), array('', ''), $htmlJs);
// option 2 - when mixing up with PHP code inside of JS script
$htmlJs = highlight_string('START<?php '.$source.' ?>END', true);
$htmlJs = str_replace(array('START<span style="color: #0000BB"><?php </span>', ' ?>END'), array('', ''), $htmlJs);
// check PHP INI setting for "highlight.keyword" (#0000BB) - http://www.php.net/manual/en/misc.configuration.php#ini.syntax-highlighting
No native function, but rather than using a full stack library just to highlight some javascript you can use this single function :
function format_javascript($data, $options = false, $c_string = "#DD0000", $c_comment = "#FF8000", $c_keyword = "#007700", $c_default = "#0000BB", $c_html = "#0000BB", $flush_on_closing_brace = false)
{
if (is_array($options)) { // check for alternative usage
extract($options, EXTR_OVERWRITE); // extract the variables from the array if so
} else {
$advanced_optimizations = $options; // otherwise carry on as normal
}
#ini_set('highlight.string', $c_string); // Set each colour for each part of the syntax
#ini_set('highlight.comment', $c_comment); // Suppression has to happen as some hosts deny access to ini_set and there is no way of detecting this
#ini_set('highlight.keyword', $c_keyword);
#ini_set('highlight.default', $c_default);
#ini_set('highlight.html', $c_html);
if ($advanced_optimizations) { // if the function has been allowed to perform potential (although unlikely) code-destroying or erroneous edits
$data = preg_replace('/([$a-zA-z09]+) = \((.+)\) \? ([^]*)([ ]+)?\:([ ]+)?([^=\;]*)/', 'if ($2) {' . "\n" . ' $1 = $3; }' . "\n" . 'else {' . "\n" . ' $1 = $5; ' . "\n" . '}', $data); // expand all BASIC ternary statements into full if/elses
}
$data = str_replace(array(') { ', ' }', ";", "\r\n"), array(") {\n", "\n}", ";\n", "\n"), $data); // Newlinefy all braces and change Windows linebreaks to Linux (much nicer!)
$data = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $data); // Regex identifies all extra empty lines produced by the str_replace above. It is quicker to do it like this than deal with a more complicated regular expression above.
$data = str_replace("<?php", "<script>", highlight_string("<?php \n" . $data . "\n?>", true));
$data = explode("\n", str_replace(array("<br />"), array("\n"), $data));
# experimental tab level highlighting
$tab = 0;
$output = '';
foreach ($data as $line) {
$lineecho = $line;
if (substr_count($line, "\t") != $tab) {
$lineecho = str_replace("\t", "", trim($lineecho));
$lineecho = str_repeat("\t", $tab) . $lineecho;
}
$tab = $tab + substr_count($line, "{") - substr_count($line, "}");
if ($flush_on_closing_brace && trim($line) == "}") {
$output .= '}';
} else {
$output .= str_replace(array("{}", "[]"), array("<span style='color:" . $c_string . "!important;'>{}</span>", "<span style='color:" . $c_string . " !important;'>[]</span>"), $lineecho . "\n"); // Main JS specific thing that is not matched in the PHP parser
}
}
$output = str_replace(array('?php', '?>'), array('script type="text/javascript">', '</script>'), $output); // Add nice and friendly <script> tags around highlighted text
return '<pre id="code_highlighted">' . $output . "</pre>";
}
Usage :
echo format_javascript('console.log("Here is some highlighted JS code using a single function !");') ;
Credit :
http://css-tricks.com/highlight-code-with-php/
Demo :
http://css-tricks.com/examples/HighlightJavaScript/
Well nice info here . Here is another nice one : http://code.google.com/p/google-code-prettify/

Replacing a specific part of a query string PHP

I use $_SERVER['QUERY_STRING'] to get the query sting.
A example would be a=123&b=456&c=789
How could I remove the b value from the query string to obtain a=123&c=789 where b can be any value of any length and is alpha numeric.
Any ideas appreciated, thanks.
A solution using url parsing:
parse_str($_SERVER['QUERY_STRING'], $result_array);
unset($result_array['b']);
$_SERVER['QUERY_STRING'] = http_build_query($result_array);
The value is going to be $_GET['b'].
How about:
str_replace('&b='.$_GET['b'], '', $_SERVER['QUERY_STRING']);
you can use this function:
function Remove_QS_Key($url, $key) {
$url = preg_replace('/(?:&|(\?))'.$key.'=[^&]*(?(1)&|)?/i', "$1", $url);
return $url;
}
to remove any key you want, e.g.
echo Remove_QS_Key("http://domain.com/?a=b&ref=dusername&c=d&e=f&g=h", "ref");
result
http://www.domain.com/?a=b&c=d&e=f&g=h
Try this:
$query_new = preg_replace('/(^|&)b=[^&]*/', '', $query);
All the answers look good, but it will be more flexible if you do:
// Make a copy of $_GET to keep the original data
$getCopy = $_GET;
unset($getCopy['b']); // or whatever var you want to take out
// This is your cleaned array
var_dump($getCopy);
// If you need the URL-encoded string, just use http_build_query()
$encodedString = http_build_query($getCopy);
You simply make a variable using $_GET and exclude b query string in build process:
$query_string_new = 'a=' . urlencode($_GET['a']) . '&c=' . urlencode($_GET['c']);
The $query_string_new should now contain a=123&c=789
Pear already has a class(Net_URL2) that handles URL parsing/building:
Install via Composer: https://packagist.org/packages/pear/net_url2
Install as include: https://github.com/pear/Net_URL2/blob/master/Net/URL2.php
Example code:
$url = new Net_URL2('http://www.example.com/?one=1');
$url->setQueryVariable('two', 2);
echo $url; // http://www.example.com/?one=1&two=2
Here is a function to replace a query parameter: (like example.com?a=1&b=2 -> example.com?a=5&b=2)
function replace_qs_key($key, $value) {
$current_url = (isset($_SERVER['HTTPS']) && $_SERVER['HTTPS'] === 'on' ? "https" : "http") .
"://$_SERVER[HTTP_HOST]$_SERVER[REQUEST_URI]";
$current_url_without_qs = strtok($current_url, '?');
parse_str($_SERVER['QUERY_STRING'], $query_params);
$query_params['page'] = $value;
$_SERVER['QUERY_STRING'] = http_build_query($query_params);
$new_url = $current_url_without_qs .'?'. $_SERVER['QUERY_STRING'];
return $new_url;
}

PHP RegEx for "Website Name"

Duplicate: PHP validation/regex for URL
My goal is create a PHP regex for website name. The regex is for a lead gathering form and should accept any legit kind of website name syntax that someone might enter. After an exhaustive search, I'm surprised that I can't find one out there.
Here are the regex matches that I'm looking for:
somewebsite.com
http://somewebsite.com
http://www.somewebsite.com
AND, it should also match:
any of the above with a trailing backslash, such as: somewebsite.com/
subdomains
No RegEx necessary.
$subject = 'example.com';
$part = (stripos($subject, 'http://') === FALSE) ? 'http://' : '' ;
var_dump(filter_var($part.$subject, FILTER_VALIDATE_URL));
You might need to tweak it:
<?php
$pattern = '/^(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?#)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?$/';
$url1 = "http://www.somewebsite.com";
$url2 = "https://www.somewebsite.com";
$url3 = "https://somewebsite.com";
$url4 = "www.somewebsite.com";
$url5 = "somewebsite.com";
function valURL($pattern, $url) {
$return = false;
if(preg_match($pattern, $url)) {
$return = true;
}
if($return == true) {
echo "Match URL: <font color='green'>" . $url . "</font><br /><br />";
} else {
echo "Try Again: <font color='red'>URL: " . $url . "</font><br /><br />";
}
}
valURL($pattern, $url1);
valURL($pattern, $url2);
valURL($pattern, $url3);
valURL($pattern, $url4);
valURL($pattern, $url5);
?>
I decided to benchmark the answers here to prove that regular expressions are not the answer for such simple tasks. Andy Leekman's code is whole 30% to 60% quicker than other answers. He did have a bug, but I fixed that with a line of code. You can view my results below.
Here's the code on which the tests ran.
http://pastie.org/476900
alt text http://img254.imageshack.us/img254/7821/capturevzh.png
PS If anyone elses uses a regular expression to validate a URL I might go mad ;)
/^([a-z0-9]([-a-z0-9]*[a-z0-9])?\\.)+((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|(c[acdfghiklmnorsuvxyz]|cat|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|(j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]|(m[acdghklmnopqrstuvwxyz]|mil|mobi|museum)|(n[acefgilopruz]|name|net)|(om|org)|(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|(t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw])$/i
http://www.shauninman.com/archive/2006/05/08/validating_domain_names
Courtesy of google. It is VERY complex though, so someone else might have a simpler one.
EDIT: Try andy's answer first. If you can find an alternative to a regex, 9/10 the alternative is much better.
^(https?://)?(([0-9a-z_!'().&=$%-]: )?[0-9a-z_!'().&=$%-]#)?(([0-9]{1,3}\.){3}[0-9]{1,3}|([0-9a-z_!'()-]\.)([0-9a-z][0-9a-z-]{0,61})?[0-9a-z]\.[a-z]{2,6})(:[0-9]{1,4})?((/?)|(/[0-9a-z_!*'().;?:#&=$,%#-])/?)$

Categories