I'm dealing with this php script, which when executed on the host gives a 500 error, apparently the line where the preg_match is is the one that contains the error...
this file is going to be executed as a cron to validate.
<?php
$encoded = wordwrap($encoded, 80, "\xa", true);
$license_file = $dir . "/modules/addons/Kayako/license.php";
if ($key != $sellKey) {
die("Invalid "license . php" file!");
}
function getWhmcsDomain() {
if (!empty($_SERVER["SERVER_NAME"])) {
return $_SERVER["SERVER_NAME"];
}
}
$license["checkdate"] = date("Ymd");
$keyName = $modleName . "_licensekey";
$dir = __DIR__;
$encoded = strrev($encoded);
$license["status"] = "Active";
$sellKey = "ModulesGarden_Kayako_54M02934WH301844E_HackbyRicRey";
$license["checktoken"] = $checkToken;
$key_data = WHMCS\Database\Capsule::table("tblconfiguration")->where("setting", "kayako_localkey")->first();
$license = array("licensekey" => $key, "validdomain" => getWhmcsDomain(), "validip" => getIp(), "validdirectory" => $dir . "/modules/addons/Kayako," . $dir . "/modules/addons," . $dir . "/modules/addons/Kayako," . $dir . "/modules/addons/Kayako," . $dir . "/modules/addons," . $dir . "," . $dir . "/modules");
$secret = "659c08a59bbb484f3b40591";
include_once "init.php";
function getIp() {
return isset($_SERVER["SERVER_ADDR"]) ? $_SERVER["SERVER_ADDR"] : $_SERVER["LOCAL_ADDR"];
}
if (!$key_data) {
WHMCS\Database\Capsule::table("tblconfiguration")->insert(array("setting" => "kayako_localkey", "value" => ''));
}
$checkToken = time() . md5(rand(1000000000, 0) . $key);
$modleName = "kayako";
$encoded = $encoded . md5($encoded . $secret);
$encoded = serialize($license);
preg_match("/kayako_licensekey\s?=\s?"([A - Za - z0 - 9_] +) "/", $content, $matches);
$encoded = md5($license["checkdate"] . $secret) . $encoded;
$key = $matches[1];
$encoded = base64_encode($encoded);
if (file_exists($license_file)) {
$content = file_get_contents($license_file);
} else {
echo "Please Upload "license . php" File Inside: " . $dir . "/modules/addons/Kayako/";
}
$content = '';
try {
WHMCS\Database\Capsule::table("tblconfiguration")->where("setting", "kayako_localkey")->update(array("value" => $encoded));
echo "Done!";
}
catch(\Throwable $e) {
echo "There is an issue, contact.";
} ?>
You have extra double quotes in the regular expression. You also have extra spaces inside the [] in the regexp. You can replace that character class with \w, which matches alphanumerics and underscore.
preg_match('/kayako_licensekey\s?=\s?(\w+)/', $content, $matches);
Another problem: You use a number of variables before you assign them:
$modleName
$checkToken
$key
$sellKey
$dir
Did you post the code out of order?
How to, using php, transform relative path to absolute URL?
function rel2abs($rel, $base)
{
/* return if already absolute URL */
if (parse_url($rel, PHP_URL_SCHEME) != '') return $rel;
/* queries and anchors */
if ($rel[0]=='#' || $rel[0]=='?') return $base.$rel;
/* parse base URL and convert to local variables:
$scheme, $host, $path */
extract(parse_url($base));
/* remove non-directory element from path */
$path = preg_replace('#/[^/]*$#', '', $path);
/* destroy path if relative url points to root */
if ($rel[0] == '/') $path = '';
/* dirty absolute URL */
$abs = "$host$path/$rel";
/* replace '//' or '/./' or '/foo/../' with '/' */
$re = array('#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#');
for($n=1; $n>0; $abs=preg_replace($re, '/', $abs, -1, $n)) {}
/* absolute URL is ready! */
return $scheme.'://'.$abs;
}
I love the code that jordanstephens provided from the link! I voted it up. l0oky inspired me to make sure that the function is port, username, and password URL compatible. I needed it for my project.
function rel2abs( $rel, $base )
{
/* return if already absolute URL */
if( parse_url($rel, PHP_URL_SCHEME) != '' )
return( $rel );
/* queries and anchors */
if( $rel[0]=='#' || $rel[0]=='?' )
return( $base.$rel );
/* parse base URL and convert to local variables:
$scheme, $host, $path */
extract( parse_url($base) );
/* remove non-directory element from path */
$path = preg_replace( '#/[^/]*$#', '', $path );
/* destroy path if relative url points to root */
if( $rel[0] == '/' )
$path = '';
/* dirty absolute URL */
$abs = '';
/* do we have a user in our URL? */
if( isset($user) )
{
$abs.= $user;
/* password too? */
if( isset($pass) )
$abs.= ':'.$pass;
$abs.= '#';
}
$abs.= $host;
/* did somebody sneak in a port? */
if( isset($port) )
$abs.= ':'.$port;
$abs.=$path.'/'.$rel;
/* replace '//' or '/./' or '/foo/../' with '/' */
$re = array('#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#');
for( $n=1; $n>0; $abs=preg_replace( $re, '/', $abs, -1, $n ) ) {}
/* absolute URL is ready! */
return( $scheme.'://'.$abs );
}
Added support to keep the current query. Helps a lot for ?page=1 and so on...
function rel2abs($rel, $base)
{
/* return if already absolute URL */
if (parse_url($rel, PHP_URL_SCHEME) != '')
return ($rel);
/* queries and anchors */
if ($rel[0] == '#' || $rel[0] == '?')
return ($base . $rel);
/* parse base URL and convert to local variables: $scheme, $host, $path, $query, $port, $user, $pass */
extract(parse_url($base));
/* remove non-directory element from path */
$path = preg_replace('#/[^/]*$#', '', $path);
/* destroy path if relative url points to root */
if ($rel[0] == '/')
$path = '';
/* dirty absolute URL */
$abs = '';
/* do we have a user in our URL? */
if (isset($user)) {
$abs .= $user;
/* password too? */
if (isset($pass))
$abs .= ':' . $pass;
$abs .= '#';
}
$abs .= $host;
/* did somebody sneak in a port? */
if (isset($port))
$abs .= ':' . $port;
$abs .= $path . '/' . $rel . (isset($query) ? '?' . $query : '');
/* replace '//' or '/./' or '/foo/../' with '/' */
$re = ['#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#'];
for ($n = 1; $n > 0; $abs = preg_replace($re, '/', $abs, -1, $n)) {
}
/* absolute URL is ready! */
return ($scheme . '://' . $abs);
}
I updated the function to fix relative URL starting with '//' improving execution speed.
function getAbsoluteUrl($relativeUrl, $baseUrl){
// if already absolute URL
if (parse_url($relativeUrl, PHP_URL_SCHEME) !== null){
return $relativeUrl;
}
// queries and anchors
if ($relativeUrl[0] === '#' || $relativeUrl[0] === '?'){
return $baseUrl.$relativeUrl;
}
// parse base URL and convert to: $scheme, $host, $path, $query, $port, $user, $pass
extract(parse_url($baseUrl));
// if base URL contains a path remove non-directory elements from $path
if (isset($path) === true){
$path = preg_replace('#/[^/]*$#', '', $path);
}
else {
$path = '';
}
// if realtive URL starts with //
if (substr($relativeUrl, 0, 2) === '//'){
return $scheme.':'.$relativeUrl;
}
// if realtive URL starts with /
if ($relativeUrl[0] === '/'){
$path = null;
}
$abs = null;
// if realtive URL contains a user
if (isset($user) === true){
$abs .= $user;
// if realtive URL contains a password
if (isset($pass) === true){
$abs .= ':'.$pass;
}
$abs .= '#';
}
$abs .= $host;
// if realtive URL contains a port
if (isset($port) === true){
$abs .= ':'.$port;
}
$abs .= $path.'/'.$relativeUrl.(isset($query) === true ? '?'.$query : null);
// replace // or /./ or /foo/../ with /
$re = ['#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#'];
for ($n = 1; $n > 0; $abs = preg_replace($re, '/', $abs, -1, $n)) {
}
// return absolute URL
return $scheme.'://'.$abs;
}
A web browser uses the page URL or the base tag to resolve relative URLs.
This script can resolve a URL relative to a base URL.
/** Build a URL
*
* #param array $parts An array that follows the parse_url scheme
* #return string
*/
function build_url($parts)
{
if (empty($parts['user'])) {
$url = $parts['scheme'] . '://' . $parts['host'];
} elseif(empty($parts['pass'])) {
$url = $parts['scheme'] . '://' . $parts['user'] . '#' . $parts['host'];
} else {
$url = $parts['scheme'] . '://' . $parts['user'] . ':' . $parts['pass'] . '#' . $parts['host'];
}
if (!empty($parts['port'])) {
$url .= ':' . $parts['port'];
}
if (!empty($parts['path'])) {
$url .= $parts['path'];
}
if (!empty($parts['query'])) {
$url .= '?' . $parts['query'];
}
if (!empty($parts['fragment'])) {
return $url . '#' . $parts['fragment'];
}
return $url;
}
/** Convert a relative path in to an absolute path
*
* #param string $path
* #return string
*/
function abs_path($path)
{
$path_array = explode('/', $path);
// Solve current and parent folder navigation
$translated_path_array = array();
$i = 0;
foreach ($path_array as $name) {
if ($name === '..') {
unset($translated_path_array[--$i]);
} elseif (!empty($name) && $name !== '.') {
$translated_path_array[$i++] = $name;
}
}
return '/' . implode('/', $translated_path_array);
}
/** Convert a relative URL in to an absolute URL
*
* #param string $url URL or URI
* #param string $base Absolute URL
* #return string
*/
function abs_url($url, $base)
{
$url_parts = parse_url($url);
$base_parts = parse_url($base);
// Handle the path if it is specified
if (!empty($url_parts['path'])) {
// Is the path relative
if (substr($url_parts['path'], 0, 1) !== '/') {
if (substr($base_parts['path'], -1) === '/') {
$url_parts['path'] = $base_parts['path'] . $url_parts['path'];
} else {
$url_parts['path'] = dirname($base_parts['path']) . '/' . $url_parts['path'];
}
}
// Make path absolute
$url_parts['path'] = abs_path($url_parts['path']);
}
// Use the base URL to populate the unfilled components until a component is filled
foreach (['scheme', 'host', 'path', 'query', 'fragment'] as $comp) {
if (!empty($url_parts[$comp])) {
break;
}
$url_parts[$comp] = $base_parts[$comp];
}
return build_url($url_parts);
}
Test
// Base URL
$base_url = 'https://example.com/path1/path2/path3/path4/file.ext?field1=value1&field2=value2#fragment';
// URL and URIs (_ is used to see what is coming from relative URL)
$test_urls = array(
"http://_example.com/_path1/_path2/_file.ext?_field1=_value1&_field2=_value2#_fragment", // URL
"//_example.com/_path1/_path2/_file.ext?_field1=_value1&_field2=_value2#_fragment", // URI without scheme
"//_example.com", // URI with host only
"/_path1/_path2/_file.ext?_field1=_value1&_field2=_value2#_fragment", // URI without scheme and host
"_path1/_path2/_file.ext", // URI with path only
"./../../_path1/../_path2/file.ext#_fragment", // URI with path and fragment
"?_field1=_value1&_field2=_value2#_fragment", // URI with query and fragment
"#_fragment" // URI with fragment only
);
// Expected result
$expected_urls = array(
"http://_example.com/_path1/_path2/_file.ext?_field1=_value1&_field2=_value2#_fragment",
"https://_example.com/_path1/_path2/_file.ext?_field1=_value1&_field2=_value2#_fragment",
"https://_example.com",
"https://example.com/_path1/_path2/_file.ext?_field1=_value1&_field2=_value2#_fragment",
"https://example.com/path1/path2/path3/path4/_path1/_path2/_file.ext",
"https://example.com/path1/path2/_path2/file.ext#_fragment",
"https://example.com/path1/path2/path3/path4/file.ext?_field1=_value1&_field2=_value2#_fragment",
"https://example.com/path1/path2/path3/path4/file.ext?field1=value1&field2=value2#_fragment"
);
foreach ($test_urls as $i => $url) {
$abs_url = abs_url($url, $base_url);
if ( $abs_url == $expected_urls[$i] ) {
echo "[OK] " . $abs_url . PHP_EOL;
} else {
echo "[WRONG] " . $abs_url . PHP_EOL;
}
}
Result
[OK] http://_example.com/_path1/_path2/_file.ext?_field1=_value1&_field2=_value2#_fragment
[OK] https://_example.com/_path1/_path2/_file.ext?_field1=_value1&_field2=_value2#_fragment
[OK] https://_example.com
[OK] https://example.com/_path1/_path2/_file.ext?_field1=_value1&_field2=_value2#_fragment
[OK] https://example.com/path1/path2/path3/path4/_path1/_path2/_file.ext
[OK] https://example.com/path1/path2/_path2/file.ext#_fragment
[OK] https://example.com/path1/path2/path3/path4/file.ext?_field1=_value1&_field2=_value2#_fragment
[OK] https://example.com/path1/path2/path3/path4/file.ext?field1=value1&field2=value2#_fragment
Wasn't in fact the question about converting path and not url? PHP actually has a function for this: realpath(). The only thing you should be aware of are symlinks.
Example from PHP manual:
chdir('/var/www/');
echo realpath('./../../etc/passwd') . PHP_EOL;
// Prints: /etc/passwd
echo realpath('/tmp/') . PHP_EOL;
// Prints: /tmp
A easy way to do this is using phpUri a small php library for converting relative urls to absolute.
Usage is simple:
require_once 'phpuri.php';
$absolute = phpUri::parse( $base_path )->join( $relative_path );
You don't even need to check that the path passed to join is actually relative. If it is absolute then parse will return it.
If the relative directory already exists this will do the job:
function rel2abs($relPath, $baseDir = './')
{
if ('' == trim($path))
{
return $baseDir;
}
$currentDir = getcwd();
chdir($baseDir);
$path = realpath($path);
chdir($currentDir);
return $path;
}
I used the same code from: http://nashruddin.com/PHP_Script_for_Converting_Relative_to_Absolute_URL
but I modified It a little bit so If base url contains PORT number it returns the relative URL with port number in it.
function rel2abs($rel, $base)
{
/* return if already absolute URL */
if (parse_url($rel, PHP_URL_SCHEME) != '') return $rel;
/* queries and anchors */
if ($rel[0]=='#' || $rel[0]=='?') return $base.$rel;
/* parse base URL and convert to local variables:
$scheme, $host, $path */
extract(parse_url($base));
/* remove non-directory element from path */
$path = preg_replace('#/[^/]*$#', '', $path);
/* destroy path if relative url points to root */
if ($rel[0] == '/') $path = '';
/* dirty absolute URL // with port number if exists */
if (parse_url($base, PHP_URL_PORT) != ''){
$abs = "$host:".parse_url($base, PHP_URL_PORT)."$path/$rel";
}else{
$abs = "$host$path/$rel";
}
/* replace '//' or '/./' or '/foo/../' with '/' */
$re = array('#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#');
for($n=1; $n>0; $abs=preg_replace($re, '/', $abs, -1, $n)) {}
/* absolute URL is ready! */
return $scheme.'://'.$abs;
}
Hope this helps someone!
This function will resolve relative URL's to a given current page url in $pgurl without regex. It successfully resolves:
/home.php?example types,
same-dir nextpage.php types,
../...../.../parentdir types,
full http://example.net urls,
and shorthand //example.net urls
//Current base URL (you can dynamically retrieve from $_SERVER)
$pgurl = 'http://example.com/scripts/php/absurl.php';
function absurl($url) {
global $pgurl;
if(strpos($url,'://')) return $url; //already absolute
if(substr($url,0,2)=='//') return 'http:'.$url; //shorthand scheme
if($url[0]=='/') return parse_url($pgurl,PHP_URL_SCHEME).'://'.parse_url($pgurl,PHP_URL_HOST).$url; //just add domain
if(strpos($pgurl,'/',9)===false) $pgurl .= '/'; //add slash to domain if needed
return substr($pgurl,0,strrpos($pgurl,'/')+1).$url; //for relative links, gets current directory and appends new filename
}
function nodots($path) { //Resolve dot dot slashes, no regex!
$arr1 = explode('/',$path);
$arr2 = array();
foreach($arr1 as $seg) {
switch($seg) {
case '.':
break;
case '..':
array_pop($arr2);
break;
case '...':
array_pop($arr2); array_pop($arr2);
break;
case '....':
array_pop($arr2); array_pop($arr2); array_pop($arr2);
break;
case '.....':
array_pop($arr2); array_pop($arr2); array_pop($arr2); array_pop($arr2);
break;
default:
$arr2[] = $seg;
}
}
return implode('/',$arr2);
}
Usage Example:
echo nodots(absurl('../index.html'));
nodots() must be called after the URL is converted to absolute.
The dots function is kind of redundant, but is readable, fast, doesn't use regex's, and will resolve 99% of typical urls (if you want to be 100% sure, just extend the switch block to support 6+ dots, although I've never seen that many dots in a URL).
Hope this helps,
function url_to_absolute($baseURL, $relativeURL) {
$relativeURL_data = parse_url($relativeURL);
if (isset($relativeURL_data['scheme'])) {
return $relativeURL;
}
$baseURL_data = parse_url($baseURL);
if (!isset($baseURL_data['scheme'])) {
return $relativeURL;
}
$absoluteURL_data = $baseURL_data;
if (isset($relativeURL_data['path']) && $relativeURL_data['path']) {
if (substr($relativeURL_data['path'], 0, 1) == '/') {
$absoluteURL_data['path'] = $relativeURL_data['path'];
} else {
$absoluteURL_data['path'] = (isset($absoluteURL_data['path']) ? preg_replace('#[^/]*$#', '', $absoluteURL_data['path']) : '/') . $relativeURL_data['path'];
}
if (isset($relativeURL_data['query'])) {
$absoluteURL_data['query'] = $relativeURL_data['query'];
} else if (isset($absoluteURL_data['query'])) {
unset($absoluteURL_data['query']);
}
} else {
$absoluteURL_data['path'] = isset($absoluteURL_data['path']) ? $absoluteURL_data['path'] : '/';
if (isset($relativeURL_data['query'])) {
$absoluteURL_data['query'] = $relativeURL_data['query'];
} else if (isset($absoluteURL_data['query'])) {
$absoluteURL_data['query'] = $absoluteURL_data['query'];
}
}
if (isset($relativeURL_data['fragment'])) {
$absoluteURL_data['fragment'] = $relativeURL_data['fragment'];
} else if (isset($absoluteURL_data['fragment'])) {
unset($absoluteURL_data['fragment']);
}
$absoluteURL_path = ltrim($absoluteURL_data['path'], '/');
$absoluteURL_path_parts = array();
for ($i = 0, $i2 = 0; $i < strlen($absoluteURL_path); $i++) {
if (isset($absoluteURL_path_parts[$i2])) {
$absoluteURL_path_parts[$i2] .= $absoluteURL_path[$i];
} else {
$absoluteURL_path_parts[$i2] = $absoluteURL_path[$i];
}
if ($absoluteURL_path[$i] == '/') {
$i2++;
}
}
reset($absoluteURL_path_parts);
while (true) {
if (rtrim(current($absoluteURL_path_parts), '/') == '.') {
unset($absoluteURL_path_parts[key($absoluteURL_path_parts)]);
continue;
} else if (rtrim(current($absoluteURL_path_parts), '/') == '..') {
if (prev($absoluteURL_path_parts) !== false) {
unset($absoluteURL_path_parts[key($absoluteURL_path_parts)]);
} else {
reset($absoluteURL_path_parts);
}
unset($absoluteURL_path_parts[key($absoluteURL_path_parts)]);
continue;
}
if (next($absoluteURL_path_parts) === false) {
break;
}
}
$absoluteURL_data['path'] = '/' . implode('', $absoluteURL_path_parts);
$absoluteURL = isset($absoluteURL_data['scheme']) ? $absoluteURL_data['scheme'] . ':' : '';
$absoluteURL .= (isset($absoluteURL_data['user']) || isset($absoluteURL_data['host'])) ? '//' : '';
$absoluteURL .= isset($absoluteURL_data['user']) ? $absoluteURL_data['user'] : '';
$absoluteURL .= isset($absoluteURL_data['pass']) ? ':' . $absoluteURL_data['pass'] : '';
$absoluteURL .= isset($absoluteURL_data['user']) ? '#' : '';
$absoluteURL .= isset($absoluteURL_data['host']) ? $absoluteURL_data['host'] : '';
$absoluteURL .= isset($absoluteURL_data['port']) ? ':' . $absoluteURL_data['port'] : '';
$absoluteURL .= isset($absoluteURL_data['path']) ? $absoluteURL_data['path'] : '';
$absoluteURL .= isset($absoluteURL_data['query']) ? '?' . $absoluteURL_data['query'] : '';
$absoluteURL .= isset($absoluteURL_data['fragment']) ? '#' . $absoluteURL_data['fragment'] : '';
return $absoluteURL;
}
You can use this composer package to do that.
https://packagist.org/packages/wa72/url
composer require wa72/url
Parse URL strings to objects
add and modify query parameters
set and modify any part of the url
test for equality of URLs with query parameters in a PHP-fashioned
way
supports protocol-relative urls
convert absolute, host-relative and protocol-relative urls to
relative and vice versa
This make suit of #jordansstephens's answer that doesn't support absolute url begins with '//'.
function rel2abs($rel, $base)
{
/* return if already absolute URL */
if (parse_url($rel, PHP_URL_SCHEME) != '') return $rel;
/* Url begins with // */
if($rel[0] == '/' && $rel[1] == '/'){
return 'https:' . $rel;
}
/* queries and anchors */
if ($rel[0]=='#' || $rel[0]=='?') return $base.$rel;
/* parse base URL and convert to local variables:
$scheme, $host, $path */
extract(parse_url($base));
/* remove non-directory element from path */
$path = preg_replace('#/[^/]*$#', '', $path);
/* destroy path if relative url points to root */
if ($rel[0] == '/') $path = '';
/* dirty absolute URL */
$abs = "$host$path/$rel";
/* replace '//' or '/./' or '/foo/../' with '/' */
$re = array('#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#');
for($n=1; $n>0; $abs=preg_replace($re, '/', $abs, -1, $n)) {}
/* absolute URL is ready! */
return $scheme.'://'.$abs;
}
I have a script that needs to pull data from different cached URLs.
Right now $url = 'http://example.com/search.php?user=abc&part='.$part;
I need the portion of the script below modified to search multiple MD5 encrypted url's
i.e.
$url = 'http://example.com/search.php?user=abc&part='.$part;
$url = 'http://example.com/search.php?user=xyz&part='.$part;
$url = 'http://example.com/search.php?user=123&part='.$part;
If more than 1 value is returned than return the one with the newest date.
$xid needs to be the current setting for $url
Original code.
function get_cache_file($url) {
$xid = md5($url);
$gendir = CACHE_ROOT . substr($xid, 0, 1) . '/'. substr($xid, 1, 2);
if(!is_dir($gendir)) {
mkdir($gendir, 0777, true);
}
return $gendir . '/' . $xid;
}
Found an answer to the problem.
function get_cache_file($part)
{
$users = array('user1', 'user2', 'user3');
$file = '';
$time = 0;
foreach ($users as $user) {
$url = 'http://example.com/search.php?user=' . $user . '&part=' . $part;
$xid = md5($url);
$gendir = CACHE_ROOT . substr($xid, 0, 1) . '/' . substr($xid, 1, 2);
if (is_dir($gendir) && is_file($gendir . '/' . $xid)) {
if ($time < filemtime($file)) {
$time = filemtime($file);
$file = $gendir . '/' . $xid;
}
}
}
I have the following code that (1) gets the page / section name from the url (2) cleans up the string and then assigns it to a variable.
I was wondering if there are any suggestions to how I can improve this code to be more efficient, possibly less if / else statements.
Also, any suggestion how I can code this so that it accounts for x amount of sub-directories in the url structure. Right now I check up to 3 in a pretty manual way.
I'd like it to handle any url, for example: www.domain.com/level1/level2/level3/level4/levelx/...
Here is my current code:
<?php
$prefixName = 'www : ';
$getPageName = explode("/", $_SERVER['PHP_SELF']);
$cleanUpArray = array("-", ".php");
for($i = 0; $i < sizeof($getPageName); $i++) {
if ($getPageName[1] == 'index.php')
{
$pageName = $prefixName . 'homepage';
}
else
{
if ($getPageName[1] != 'index.php')
{
$pageName = $prefixName . trim(str_replace($cleanUpArray, ' ', $getPageName[1]));
}
if (isset($getPageName[2]))
{
if ( $getPageName[2] == 'index.php' )
{
$pageName = $prefixName . trim(str_replace($cleanUpArray, ' ', $getPageName[1]));
}
else
{
$pageName = $prefixName . trim(str_replace($cleanUpArray, ' ', $getPageName[2]));
}
}
if (isset($getPageName[3]) )
{
if ( $getPageName[3] == 'index.php' )
{
$pageName = $prefixName . trim(str_replace($cleanUpArray, ' ', $getPageName[2]));
}
else
{
$pageName = $prefixName . trim(str_replace($cleanUpArray, ' ', $getPageName[3]));
}
}
}
}
?>
You are currently using a for-loop, but not using the $i iterator for anything - so to me, you could drop the loop entirely. From what I can see, you just want the directory-name prior to the file to be the $pageName and if there is no prior directory set it as homepage.
You can pass $_SERVER['PHP_SELF'] to basename() to get the exact file-name instead of checking the indexes, and also split on the / as you're currently doing to get the "last directory". To get the last directory, you can skip indexes and directly use array_pop().
<?php
$prefixName = 'www : ';
$cleanUpArray = array("-", ".php");
$script = basename($_SERVER['PHP_SELF']);
$exploded = explode('/', substr($_SERVER['PHP_SELF'], 0, strrpos($_SERVER['PHP_SELF'], '/')));
$count = count($exploded);
if (($count == 1) && ($script == 'index.php')) {
// the current page is "/index.php"
$pageName = $prefixName . 'homepage';
} else if ($count > 1) {
// we are in a sub-directory; use the last directory as the current page
$pageName = $prefixName . trim(str_replace($cleanUpArray, ' ', array_pop($exploded)));
} else {
// there is no sub-directory and the script is not index.php?
}
?>
In the event that you want a more breadcumbs-feel, you may want to keep each individual directory. If this is the case, you can update the middle if else condition to be:
} else if ($count > 1) {
// we are in a sub-directory; "breadcrumb" them all together
$pageName = '';
$separator = ' : ';
foreach ($exploded as $page) {
if ($page == '') continue;
$pageName .= (($pageName != '') ? $separator : '') . trim(str_replace($cleanUpArray, ' ', $page));
}
$pageName = $prefixName . $pageName;
} else {
I found this code very helpful
$protocol = strpos(strtolower($_SERVER['SERVER_PROTOCOL']),'https') ===
FALSE ? 'http' : 'https'; // Get protocol HTTP/HTTPS
$host = $_SERVER['HTTP_HOST']; // Get www.domain.com
$script = $_SERVER['SCRIPT_NAME']; // Get folder/file.php
$params = $_SERVER['QUERY_STRING'];// Get Parameters occupation=odesk&name=ashik
$currentUrl = $protocol . '://' . $host . $script . '?' . $params; // Adding all
echo $currentUrl;
I'm grabbing links from a website, but I'm having a problem in which the higher I set the recursion depth for the function the results become stranger
for example
when I set the function to the following
crawl_page("http://www.mangastream.com/", 10);
I will get a results like this for about half the page
http://mangastream.com/read/naruto/51619850/1/read/naruto/51619850/2/read/naruto/51619850/2/read/naruto/51619850/2/read/naruto/51619850/2/read/naruto/51619850/2/read/naruto/51619850/2/read/naruto/51619850/2
EDIT
while I'm expecting results like this instead
http://mangastream.com/manga/read/naruto/51619850/1
here's the function I've been using to get the results
function crawl_page($url, $depth)
{
static $seen = array();
if (isset($seen[$url]) || $depth === 0) {
return;
}
$seen[$url] = true;
$dom = new DOMDocument('1.0');
#$dom->loadHTMLFile($url);
$anchors = $dom->getElementsByTagName('a');
foreach ($anchors as $element) {
$href = $element->getAttribute('href');
if (0 !== strpos($href, 'http')) {
$href = rtrim($url, '/') . '/' . ltrim($href, '/');
}
if(shouldScrape($href)==true)
crawl_page($href, $depth - 1);
}
echo $url,"\r";
//,pageStatus($url)
}
any help with this would be greatly appreciated
the construction of your new url is not correct, replace :
$href = rtrim($url, '/') . '/' . ltrim($href, '/');
with :
if (substr($href, 0, 1)=='/') {
// href relative to root
$info = parse_url($url);
$href = $info['scheme'].'//'.$info['host'].$href;
} else {
// href relative to current path
$href = rtrim(dirname($url), '/') . '/' . $href;
}
I think your problem lies in this line:
$href = rtrim($url, '/') . '/' . ltrim($href, '/');
To all relative urls on any given page this statement will prepend the current page url, which is obviously not what you want. What you need is to prepend only the protocol and host part of the URL.
Something like this should fix your problem (untested):
$url_parts = parse_url($url);
$href = $url_parts['scheme'] . '://' . $url_parts['host '] . $href;