Make a local link global /test.php -> example.com/test.php - php

I've been working on a spider algorithm and have been having some issues with the links.
example of how it works:
got content from -> example.com/bob/index.php?page=funny+faces
content is :
<html>
link 1
link 2
link 3
</html>
pass content through get links function
links function returned
[0] = ../jack/index.php
[1] = /bob_more_info
[2] = http://www.youtube.com
now I need to make these links urls by what page I got them on (example.com/bob/index.php?page=funny+faces)
so
[0] -> ../jack/index.php into example.com/jack/index.php
[1] -> /bob_more_info into example.com/bob/bob_more_info
[2] -> http://www.youtube.com
What I am asking for is a function that can do the conversion. This is mine, but it's not always working and is becoming a pain. If you could edit it or write me a function it would be much appreciated. Thanks in advance.
Here is my function currently:
//example:
//$newURL = URLfix("example.com/bob/index.php?page=funny+faces", "../jack/index.php");
function URLfix ($url, $ext)
{
if(is_valid_url($url."/"))
{
$url .= "/";
}
$ar1 = explode("/", $url);
if(count($ar1) == 1)
{
return $url."/".$ext;
}
$target = $ar1[count($ar1) - 1];
if($target == "")
{
return $url.$ext;
}
if(strpos(" ".$target, "."))
{
$cur = "";
for($i = 0; $i < count($ar1) - 1; $i ++)
{
$cur .= $ar1[$i];
$cur .= "/";
}
return $cur.$ext;
}
return $url."/".$ext;
}

use explode() to split the $url into an array delimited by /, then $bits[0] for example would contain example.com

since
example.com/jack/index.php
is equivalent to:
example.com/bob/../jack/index.php
I wouldn't worry about that part. For the url, I would remove the query string first, then pop off the last segment to get the base url:
list($url, $query_string = explode("?", $url);
$segments = explode("/", $url);
array_pop($segments);
$base_url = implode("/", $segments);
Do be sure to add some error checks.

A specification exists which explains step by step how to resolve a relative URI to it's base URI. It's RFC 3986:
What you call a "global link" is just the URI Reference.
What you call a "local link" is named Relative Reference.
Every relative reference has a base reference it refers to. The base reference is a URI reference. You can resolve a new URI reference from any base URI reference and the relative reference. This process is called Relative Resolution.
PHP code that does this, is available in the Net_URL2 PEAR Package it has an example how to use this look for ->resolve().

Related

PHP replace URL segment with str_replace();

I have "/foo/bar/url/" coming straight after my domain name.
What I want is to find penultimate slash symbol in my string and replace it with slash symbol + hashtag. Like so: from / to /# (The problem is not how to get URL, but how to handle it)
How this could be achieved? What is the best practice for doing stuff like that?
At the moment I'm pretty sure that I should use str_replace();
UPD. I think preg_replace() would be suitable for my case. But then there is another problem: what should regexp look like in order to make my issue solved?
P.S. Just in a case I'm using SilverStripe framework (v3.1.12)
$url = '/foo/bar/url/';
if (false !== $last = strrpos($url, '/')) {
if (false !== $penultimate = strrpos($url, '/', $last - strlen($url) - 1)) {
$url = substr_replace($url, '/#', $penultimate, 1);
}
}
echo $url;
This will output
/foo/bar/#url/
If you want to strip the last /:
echo rtrim($url, '/'); // print /foo/bar/#url
Here is a method that would function. There are probably cleaner ways.
// Let's assume you already have $url_string populated
$url_string = "http://whatever.com/foo/bar/url/";
$url_explode = explode("\\",$url_string);
$portion_count = count($url_explode);
$affected_portion = $portion_count - 2; // Minus two because array index starts at 0 and also we want the second to last occurence
$i = 0;
$output = "";
foreach ($url_explode as $portion){
$output.=$portion;
if ($i == $affected_portion){
$output.= "#";
}
$i++;
}
$new_url = $output;
Assuming you now have
$url = $this->Link(); // e.g. /foo/bar/my-urlsegment
You can combine it like
$handledUrl = $this->ParentID
? $this->Parent()->Link() + '#' + $this->URLSegment
: $this->Link();
where $this->Parent()->Link() is e.g. /foo/bar and $this->URLSegment is my-urlsegment
$this->ParentID also checks if we have a parent page or are on the top level of SiteTree
I might be tooooo late for answering this question but I thought this might help you. You can simply use preg_replace like as
$url = '/foo/bar/url/';
echo preg_replace('~(\/)(\w+)\/$~',"$1#$2",$url);
Output:
/foo/bar/#url
In my case this solved my problem:
$url = $this->Link();
$url = rtrim($url, '/');
$url = substr_replace($url, '#', strrpos($url, '/') + 1, 0);

Replace pattern in URL through f3-Routing engine

I working on a flat-file-based project and I try to remove specific pattern from URL. The content is stored in a "content"-directory with markdown-files. I want sortable content-folder-names like:
contentfolder:
- 01-home
- 01-subpage
- 02-subpage2
- 02-page02
- 01-subpage
- 02-subpage2
etc...
At the moment, the URL would look something like this:
http://domain.com/01-home/02-subpage
This is really ugly ;)
I would prefer to get the url work as follows:
http://domain.com/home/subpage
I would prefer a solution, which works for every case of url:
http://domain.com/home/subpage/subsubpage/subsubsubpage
etc.
My script use at the moment the f3-Wildcard-Solution (GET /*). The requested URL will be replaced to get the content.
function find($path = "") {
$dirname = str_replace(globals::root(), "", globals::current());
if ($dirname == "/") {
$this->location = globals::content() . globals::home() . "/" . $path;
} else {
$this->location = globals::content() . $dirname . "/" . $path;
}
return $this;
}
Thank you guys!
Here's a solution, assuming that:
your markdown files are located in a content/ folder
every subfolder is prefixed with two digits and a hyphen (XX-something)
=>
$f3=require('lib/base.php');
$f3->set('CONTENT','content/');//location of markdown files
$f3->route('GET /*',function($f3,$params){
$path=str_replace('..','',$params[1]);//security
$dirs=glob($f3->get('CONTENT').'??-'.str_replace('/','/??-',$path));
if ($dirs) {
$dir=$dirs[0];//pick first match
echo \Markdown::instance()->convert($dir.'/default.md');
} else
$f3->error(404);
});
$f3->run();

Adding to url with link

My url contains many variables that I want untouched (don't worry they aren't important).
Let's say it contained...
../index.php?id=5
How would I make a url that just adds
&current=1
rather than replacing it entirely?
I'd like...
../index.php?id=5&current=1
rather than..
../index.php?current=1
I know it's a simple question but that's why I can't figure it out.
Thanks.
To append a parameter to a URL you can do this:
function addParam( $url, $param ){
if( strrpos( $url, '?' ) === false){
$url .= '?' . $param;
} else {
$url .= '&' . $param;
}
return $url;
}
$url = "../index.php?id=5";
$url = addParam( $url, "current=1");
You should just create your link to 'add' that parameter
The Link
and then obviously in the index.php somewhere you'll look for the current variable and do what you need to:
<?php
if(isset($_GET['current']) && !empty($_GET['current]) {
// Do stuff here for the 'current' variable
$current = trim($_GET['current']);
}
?>
On the links that you require the $current variable, I suppose that you could just casually put it in the href attribute. For the index,php file, so something like this....
if(isset($_GET['current']))
{
$current = $_GET['current'];
//Do the rest of what you need to do with this variable
}
Try this one:
$givenVar = "";
foreach($_GET as $key=>$val){
$givenVar .= "&".$key."=".$val;
}
$var = "&num=1";
$link = "?".$givenVar."".$var;
echo $link;
You can just add the variable to the href,
When you clink it while the address is
../index.php?id=5
trust me you then go to
../index.php?id=5&current=1
BUT if you click that link again, than you 'll go to
../index.php?id=5&current=1&current=1
Actually I thinks that's tricky and bad practice to just append the variable.
I suggest you to do it like:
<?php
$query = isset($_GET) ? http_build_query($_GET) . '&current=1' : 'current=1';
?>
A Label
take a look http://us.php.net/manual/en/function.http-build-query.php
I don't know why in Earth you would need this, but here we are. This should do the trick.
$appendString = "&current=1";
$pageURL = $_SERVER["REQUEST_URI"].$appendString;
$_SERVER["REQUEST_URI"] should return just the name of the requested page, with any other GET variable attached. The other string should be clear enough!

keeping url parameters during pagination

Is there any way to keep my GET parameters when paginating.
My problem is that I have a few different urls i.e
questions.php?sort=votes&author_id=1&page=3
index.php?sort=answers&style=question&page=4
How in my pagination class am I supposed to create a link to the page with a different page number on it but yet still keep the other parts of the url?
If you wanted to write your own function that did something like http_build_query, or if you needed to customize it's operations for some reason or another:
<?php
function add_edit_gets($parameter, $value) {
$params = array();
$output = "?";
$firstRun = true;
foreach($_GET as $key=>$val) {
if($key != $parameter) {
if(!$firstRun) {
$output .= "&";
} else {
$firstRun = false;
}
$output .= $key."=".urlencode($val);
}
}
if(!$firstRun)
$output .= "&";
$output .= $parameter."=".urlencode($value);
return htmlentities($output);
}
?>
Then you could just write out your links like:
Click to go to page 2
You could use http_build_query() for this. It's much cleaner than deleting the old parameter by hand.
It should be possible to pass a merged array consisting of $_GET and your new values, and get a clean URL.
$new_data = array("currentpage" => "mypage.html");
$full_data = array_merge($_GET, $new_data); // New data will overwrite old entry
$url = http_build_query($full_data);
In short, you just parse the URL and then you add the parameter at the end or replace it if it already exists.
$parts = parse_url($url) + array('query' => array());
parse_str($parts['query'], $query);
$query['page'] = $page;
$parts['query'] = http_build_str($query);
$newUrl = http_build_url($parts);
This example code requires the PHP HTTP module for http_build_url and http_build_str. The later can be replaced with http_build_query and for the first one a PHP userspace implementation exists in case you don't have the module installed.
Another alternative is to use the Net_URL2 package which offers an interface to diverse URL operations:
$op = new Net_URL2($url);
$op->setQueryVariable('page', $page);
$newUrl = (string) $op;
It's more flexible and expressive.
How about storing your page parameter in a session, so you don't have to modify every single page url?

Scrape FULL image src with PHP

I am trying to scrape img src's with php, I can get the src fine, but if the src does not include the full path then I can't really reuse it. Is there a way to grab the full path of the image using php (browsers can get it if you use the right click menu).
ie. How do I get a FULL path including the domain in one of the following two examples?
src="../foo/logo.png"
src="/images/logo.png"
Thanks,
Allan
You don't need a regex... just some patience. I don't really want to write the code for you, but just check if the src starts with http://, and if not, you have like 3 different cases.
If it begins with a / then prepend http://domain.com
If it begins with .. you'll have to split the full URL and hack off pieces until the src starts with a /
Else (it begins with a letter), the take the full domain, and strip it down to the last slash then append the src URL.
Or.... be lazy and steal this script
$url = "http://www.goat.com/money/dave.html";
$rel = "../images/cheese.jpg";
$com = InternetCombineURL($url,$rel);
// Returns http://www.goat.com/images/cheese.jpg
function InternetCombineUrl($absolute, $relative) {
$p = parse_url($relative);
if($p["scheme"])return $relative;
extract(parse_url($absolute));
$path = dirname($path);
if($relative{0} == '/') {
$cparts = array_filter(explode("/", $relative));
}
else {
$aparts = array_filter(explode("/", $path));
$rparts = array_filter(explode("/", $relative));
$cparts = array_merge($aparts, $rparts);
foreach($cparts as $i => $part) {
if($part == '.') {
$cparts[$i] = null;
}
if($part == '..') {
$cparts[$i - 1] = null;
$cparts[$i] = null;
}
}
$cparts = array_filter($cparts);
}
$path = implode("/", $cparts);
$url = "";
if($scheme) {
$url = "$scheme://";
}
if($user) {
$url .= "$user";
if($pass) {
$url .= ":$pass";
}
$url .= "#";
}
if($host) {
$url .= "$host/";
}
$url .= $path;
return $url;
}
From http://www.web-max.ca/PHP/misc_24.php
Unless you have the site URL you're starting with (in which case you can prepend it to the value of the src attribute) it seems like all you're left with there is a string.
I'm assuming you don't have access to any additional information of course. If you're parsing HTML, I'd assume you must be able to access an absolute URL to at least the HTML page, but perhaps not.

Categories