How to make all src strings global in PHP? - php

I am writing a web browser in PHP, for devices (such as the Kindle) which do not support multi-tab browsing. Currently I am reading the page source with file_get_contents(), and then echoing it into the page. My problem is that many pages use local references (such as < img src='image.png>'), so they all point to pages that don't exist. What I want to do is locate all src and href tags and prepend the full web address to any that do not start with "http://" or "https://". How would I do this?

add <base href="http://example.com/" />
at the head of the page
this will help you insert it to the <head></head> section

Like elibyy suggested, I too would recommend using the base tag. Here's a way to do it with PHP's native DOMDocument:
// example url
$url = 'http://example.com';
$doc = new DOMDocument();
$doc->loadHTMLFile( $url );
// first let's find out if there a base tag already
$baseElements = $doc->getElementsByTagName( 'base' );
// if so, skip this block
if( $baseElements->length < 1 )
{
// no base tag found? let's create one
$baseElement = $doc->createElement( 'base' );
$baseElement->setAttribute( 'href', $url );
$headElement = $doc->getElementsByTagName( 'head' )->item( 0 );
$headElement->appendChild( $baseElement );
}
echo $doc->saveHTML();
Having said this however; are you sure you are aware of how ambitious your goal is?
For instance, I don't think this is exactly what you really need at all, as your application is basically acting as a proxy. Therefor you will probably want to route, at least, all user-clickable links through your application, and not route them directly to the original links at all, because I presume you want to keep the user in your tabbed application, and not break out of it.
Something like:
http://yourapplication.com/resource.php?resource=http://example.com/some/path/
Now, this could of course be achieved by basically doing what you requested, and in stead of prepending it with either, http:// or https:// prepend with something such that it results in above example url.
However, how are you gonna discern what resources to do this with, and what resources not? If you take this approach for all resources in the page, your application will quickly become a full fletched proxy, thereby becoming very resource intensive.
Hopefully I've given you a brief starter for some things to take into consideration.

Related

How to remove any given $_GET variable from URL with PHP?

I'm puttings filters in links with GET variables like this: http://example.com/list?size=3&color=7 and I'd like to remove any given filter parameter from URL whenever a different value for that particular filter is selected so that it doesn't, for example, repeat the color filter like so:
http://example.com/list?size=3&color=7&color=1
How can I if(isset($_GET['color'])) { removeGet('color'); } ?
You can use parse_url and parse_str to extract parameters like in example below:
$href = 'http://example.com/list?size=3&color=7';
$query = parse_url( $href, PHP_URL_QUERY );
parse_str( $query, $params );
// set custom paramerets
$params['color'] = 1;
// build query string
$query = http_build_query( $params );
// build url
echo explode( '?', $href )[0] . '?' . $query;
In this example explode() is used to extract the part of the url before the query string, and http_build_query to generate query string, you can also use PECL http_build_url() function, if you cannot use PECL use alternative like in this question.
You can't remove variables from GET request, just redirect to address without this var.
if (isset($_GET['color'])) {
header ('Location: http://www.example.com/list?size=' . $_GET['size']);
exit;
}
Note: in URL http://example.com/list?size=3&color=7&color=1 is just one $_GET['color'], not two. Only one of them is taken. You can check, is $_GET['key'] exists, but you don't know how many of them you have in your URL
So, assuming I'm understanding your question correctly.
Your situation is as follows:
- You are building URLs which you put into a webpage as a link ( <a href= )
- You are using the GET syntax/markup (URL?key=value&anotherkey=anothervalue) as a way to assign filters of some sort which the user then receives when they click on a given link
What you want is to be able to modify one of the items in your GET parameter list (http://example.com/list?size=3&color=7&color=1) so you have only one filter key but you can modify the filter value. So instead of the above you would start with: (http://example.com/list?size=3&color=7) but after changing the color 'filter' you would instead have http://example.com/list?size=3&color=1).
Additionally you want to do the above in PHP, (as opposed to JavaScript etc...).
There are a lot of ways to implement the change and the most effective way to do it depends on what you are already doing, most likely.
First, if you are dynamically producing the HTML markup which includes the links with the filter text, (which is what it sounds like), then it makes the most sense to create a PHP array to hold your GET parameters, then write a function that would turn those parameters into the GET string.
New filters would appear when a user refreshed the page, (because, if you are dynamically producing the HTML then a server request is required to rebuild the page).
IF, however, you want to update the link URLs on a live page WITHOUT a reload look into doing it with JavaScript, it will make your life easier.
NOTE: It is likely possible to modify the page, assuming the links are hard coded, & the page is hard coded markup, by opening the page as a file in PHP & making the appropriate change. It's my opinion that this would be a headache and not worth the time & effort AND it would still require a page reload (which you could NOT trigger yourself).
Summary
If you are writing dynamic pages with PHP it shouldn't be a big deal, just create a structure (class or array) and a method/function to write that structure out as a GET string. The structure could then be modified according to your desire before generating the page.
If, however, you are dealing with a static page, I recommend JavaScript (either creating js structures to allow a user to dynamically select filters or utilizing AJAX to build new GET parameter lists with PHP and send that back to the javascript).
(NOTE: I am reminded that I have done something along the lines of modifying links on-the-fly for existing pages by intercepting them before they are displayed to the user [using PHP] but my hands were tied in other areas and I would not recommend it if you have a choice AND it should be noted that this still required a reload...)
Try doing something like this in your back-end script:
$originalValues=array();
foreach($_GET as $filter=>$value)
{
if(empty($originalValues[$filter]))
$originalValues[$filter] = $value;
}
This may do what you want, but it feels hackish. You may want to revise your logic.
Good luck!
just put a link/button send the user to index... like this.
<a class="btn btn-primary m-1" href="http:yoururl/index.php" role="button">Limpar</a>

How can I load a url obtained from $_SERVER['REQUEST_URI'] into DOMDocument?

How to load a URL obtained from $_SERVER['REQUEST_URI'] into domDocument?
I am trying to load a dynamic webpage into DOMDocument to be parsed for certain words. Ultimately I want to create a glossary for my site (Tiki Wiki CMS). I started very simple and right now I am only trying to load a page and parse the text for testing purposes.
I am new to DOMDocument and after reading several articles on this site and on PHP Manual, I know that I have to load a html page with loadHTMLFile, then parse the site by getElementsById or getElementsByTagName in order to do stuff with it. It works fine for static pages, but the main problem I am having is that I cannot enter a static url into loadHTMLFile, because parsing should be performed when the site is uploaded by the user.
Here's the code that DID work:
$url = 'http://mysite.org/bbk/tiki-index.php?page=pagetext';
$dom = new DOMDocument;
$dom->loadHTMLFile($url);
$a = $dom->getElementsByTagName('a');
foreach ($a as $link) {
echo $link->nodeValue;
}
So, I thought I could use $_SERVER['REQUEST_URI'] for the job, but it did not work.
This did NOT work (no error message):
$url = $_SERVER['REQUEST_URI'];
$dom = new DOMDocument;
$dom->loadHTMLFile($url);
$a = $dom->getElementsByTagName('a');
foreach ($a as $link) {
echo $link->nodeValue;
}
After checking what the $url output was, I decided to add http://mysite.org to it to make it identical to the url that worked. However, no luck either and this time I got an internal server error.
This did NOT work either (Internal Server Error):
$url = 'http://mysite.org' . $_SERVER['REQUEST_URI'];
$dom = new DOMDocument;
$dom->loadHTMLFile($url);
$a = $dom->getElementsByTagName('a');
foreach ($a as $link) {
echo $link->nodeValue;
}
I think I am missing something substantial here and I thought it might just not be possible to use DOMDocument in this way, so I was searching the web for help again (if it is possible to use $_SERVER['REQUEST_URI'] in combination with DOMdocument at all), but I didn't find an answer. So I hope anybody here can help. Any suggestions including third party parsers etc. would be helpful, except anything that requires parsing with regex. Tiki Wiki CMS already has a glossary option done with regex, but it is very buggy.
Thanks.
UPDATE
I haven't found an answer to the problem, but I think I have an idea on where my mistake was. I was expecting $_SERVER['REQUEST_URI'] to run on a dynamic page that was not completely built yet. I ran the script on the main setup page, so I guess the html was not rendered yet, when I tried to point $_SERVER['REQUEST_URI'] to it. When I noticed that this might be the problem, I abandoned the idea of parsing the document with DomDocument and used a javascript solution that can be loaded after the document is ready.
I can think of two things that you can do (probably won't solve your problem directly, but will help you greatly with solving it):
$_SERVER['REQUEST_URI'] doesn't contain what you think it does. Try echoing or var_dumping it, and see if it matches your expectations.
Enable error reporting. The reason you are seeing a generic 500 error page, is because error reporting is disabled. enable it using error_reporting().
Also note that DOMDocument only parses HTML, if you have dynamic DOM nodes generated and added to the page using a client-side language, or CSS pseudo elements, they won't be displayed unless you deploy a JS/CSS parser as well (which is not trivial).

untangling directory separator madness using string manipulation?

I'm working on converting a website. It involved standardizing the directory structure of images and media files. I'm parsing path information from various tags, standardizing them, checking to see if the media exists in the new standardized location, and putting it there if it doesn't. I'm using string manipulation to do so.
This is a little open-ended, but is there a class, tool, or concept out there I can use to save myself some headaches? For instance, I'm running into problems where, say, a page in a sudirectory (website.com/subdir/dir/page.php) has relative image paths (../images/image.png), or other kinds of things like this. It's not like there's one overarching problem, but just a lot of little things that add up.
When I think I've got my script covering most cases, then I get errors like Could not find file at export/standardized_folder/proper_image_folderimage.png where it should be export/standardized_folder/proper_image_folder/image.png. It's kind of driving me mad, doing string parsing and checks to make sure that directory separators are in the proper places.
I feel like I'm putting too much work into making a one-off import script very robust. Perhaps someone's already untangled this mess in a re-useable way, one which I can take advantage of?
Post Script: So here's a more in-depth scoop. I write my script that parses one "type" of page and pulls content from the same of its kind. Then I turn my script to parse another type of page, get all knids of errors, and learn that all my assumptions about how paths are referenced must be thrown out the window. Wash, rinse, repeat.
So I'm looking at doing some major re-factoring of my script, throwing out all assumptions, and checking, re-checking, and double-checking path information. Since I'm really trying to build a robust path building script, hopefully I can avoid re-inventing the wheel. Is there a wheel out there?
If your problems have their root in resolving the relative links from a document and resolve to an absolute one (which should be half the job to map the linked images paths onto the file-system), I normally use Net_URL2 from pear. It's a simple class that just does the job.
To install, as root just call
# pear install channel://pear.php.net/Net_URL2-0.3.1
Even if it's a beta package, it's really stable.
A little example, let's say there is an array with all the images srcs in question and there is a base-URL for the document:
require_once('Net/URL2.php');
$baseUrl = 'http://www.example.com/test/images.html';
$docSrcs = array(...);
$baseUrl = new Net_URL2($baseUrl);
foreach($docSrcs as $href)
{
$url = $baseUrl->resolve($href);
echo ' * ', $href, ' -> ', $url->getURL(), "\n";
// or
echo " $href -> $url\n"; # Net_URL2 supports string context
}
This will convert any relative links into absolute ones based on your base URL. The base URL is first of all the documents address. The document can override it by specifying another one with the base elementDocs. So you could look that up with the HTML parser you're already using (as well as the src and href values).
Net_URL2 reflects the current RFC 3986 to do the URL resolving.
Another thing that might be handy for your URL handling is the getNormalizedURL function. It does remove some potential error-cases like needless dot segments etc. which is useful if you need to compare one URL with another one and naturally for mapping the URL to a path then:
foreach($docSrcs as $href)
{
$url = $baseUrl->resolve($href);
$url = $url->getNormalizedURL();
echo " $href -> $url\n";
}
So as you can resolve all URLs to absolute ones and you get them normalized, you can decide whether or not they are in question for your site, as long as the url is still a Net_URL2 instance, you can use one of the many functions to do that:
$host = strtolower($url->getHost());
if (in_array($host, array('example.com', 'www.example.com'))
{
# URL is on my server, process it further
}
Left is the concrete path to the file in the URL:
$path = $url->getPath();
That path, considering you're comparing against a UNIX file-system, should be easy to prefix with a concrete base directory:
$filesystemImagePath = '/var/www/site-new/images';
$newPath = $filesystemImagePath . $path;
if (is_file($newPath))
{
# new image already exists.
}
If you've got problems to combine the base path with the image path, the image path will always have a slash at the beginning.
Hope this helps.
Truepath() to the rescue!
No, you shouldn't use realpath() (see why).

a good way to implent l.php like facebook?

what is the l.php
im guessing l.php is link.php on
facebook. its a pre-redirecting
script. every link on facebook is
filtered to link.php?to=link
i have some things in mind. like
make a render class or filter the input with regex, finds an www.link.com then change it to something like link , the problem is when we have some changes, if we change something then all our archives links isn't going to work ( assume that we put the input to the database. ).
use a JavaScript to do this, i mean yes, but im not familiar with js, but i know we can search and change the links live like above. but the problem is when the user doesn't have a js on, ( mobile, or no JavaScript plugin, etc )
or maybe you have a better answer what is the best way we implant l.php ?
This would be one of the pros of using a function to generate links in your views:
link text
Your build_link() would be like so:
function build_link($path)
{
$prefix = '';
if (preg_match('|^(https?://|www\.)', $path))
{
$prefix = 'http://www.oursite.php?to=';
}
return $prefix . $path;
}
This means you could even use a CDN easily amongst your links.
The easiest option is to use Javascript, but like you say is reliant on the user to have Javascript enabled browser, which is highly likey due to the advance in browser technology, even in the mobile platform. The jQuery would look like so:
$(document).ready(function() {
$('a').live('click', function() {
var href = $(this).attr('href');
if (href.match(/^(https?:\/\/|www\\.)/i))
{
this.href = 'http://www.oursite.php?to=' + href;
}
return this.href;
});
});
If changing your links is not viable, i.e. you need to rewrite links that is inside a database or similar, I would look into using DOMDocument -- it's generally good practice to avoid messy regexes when dealing with complex structures such as HTML
$doc = new DOMDocument('1.0', 'UTF-8');
$doc->loadHTML($text);
$anchors = $doc->getElementsByTagName('a');
foreach($anchors as $a) {
$href = $a->getAttribute('href');
if (preg_match('|^(https?://|www\.)', $href))
{
$a->setAttribute('href', 'http://www.oursite.php?to=' . $href);
}
}
$newHTML = $doc->saveHTML();
So there are three options here:
Go through every part on your site that outputs links to external sites, and re-write them through a build_link function
Rewrite using the help of DOMDocument
Use Javascript
I would have gone with using PHP to let the content of the page run through a filter that uses regex to replace the links.
Using javascript would mean that paople can turn it off. PHP ensures that all links get replaced correctly.
Just remember to make sure that any internal links (like navigation) is left untouched.
There are many types of regex that finds urls and you have to choose one that suits your needs
Edit: I would use this regex (https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)

jQuery/Drupal Append string to all HREF on site (Any page)

So I have a site with a dozen pages on it using Drupal as a CMS. Wanted to know how to set/append all href links on any page with a dynamic attribute?
so here is an example URL:
http://www.site.com/?q=node/4
and I wanted to append to the URL like so:
http://www.site.com/?q=node/4&attr=1234
Now I have a nav bar on the site and when I hover over the link I see the url but I need to append the &attr=1234 string to the end of it. The string is dynamic so it might change from time to time.
I was thinking jQuery would be a good choice to do this but does Drupal have any functionality as well?
Now I've seen a couple of posts on Stack:
Post 1
Post 2
Problem is I'm learning my way around Drupal and have minimal experience with jQuery but getting better with both. I see the jQuery can replace a HREF but looks like they hard coded the HREF, could jQuery find all HREF's on a page and append the string to it? Also does Drupal have this functionality and what would be the best approach?
Also need this to work for clean or standard URL format, so I think Apache would handle this I just wanted to make sure.
Thanks for any help
EDIT:
Looks like the general consensus is the Drupal should handle this type of request. Just looking for the best implementation. Simple function call would be best but I would like it to dynamically add it to all existing href's as I want this to be dynamic instead of hard coding any url/href calls. So I could add/remove pages on the fly without the need to reconfigure/recode anything.
Thanks for the great tips though
EDIT #2:
Okay maybe I'm asking the wring question. Here is what I need and why it's not working for me yet.
I need to pass a value in the url that changes some of the look and feel of the site. I need it to be passed on just about every href tag on the page but not on User logout or Admin pages.
I see in my template code where the nav links get generated, so I though I could pass my code in the attributes array as the second parm to the function, but that is setting the tag attributes and not the URL attributes.
Now I see the bottom nav links use this Drupal function: menu_navigation_links() in menu.inc but the top nav uses a custom function.
This function in the template.php script looks to be the one creating the links
function lplus($text, $path, $options = array()) {
global $language;
// Merge in defaults.
$options += array(
'attributes' => array(),
'html' => FALSE,
);
// Append active class.
if (($path == $_GET['q'] || ($path == '<front>' && drupal_is_front_page())) &&
(empty($options['language']) || $options['language']->language == $language->language)) {
if (isset($options['attributes']['class'])) {
$options['attributes']['class'] .= ' active';
}
else {
$options['attributes']['class'] = 'active';
}
}
// Remove all HTML and PHP tags from a tooltip. For best performance, we act only
// if a quick strpos() pre-check gave a suspicion (because strip_tags() is expensive).
if (isset($options['attributes']['title']) && strpos($options['attributes']['title'], '<') !== FALSE) {
$options['attributes']['title'] = strip_tags($options['attributes']['title']);
}
return '<a href="'. check_url(url($path, $options)) .'"'. drupal_attributes($options['attributes']) .'><b>'. ($options['html'] ? $text : check_plain($text)) .'</b></a>';
}
not sure how to incorporate what I need into this function.
So on the home page the ?q=node/ is missing and if I append the ampersand it throws an error.
http://www.site.com/&attr=1234 // throws error
But if I mod it to the correct format it works fine
http://www.site.com/?attr=1234
Assuming that when you mean pages, you mean the content type of pages (will work for any other content type as long as it's really content and not something in a block or view).
You can easily replace the contents of any node that is about to be viewed by running a str_replace or with a regular expression. For instance, using str_replace:
function module_nodeapi(&$node, $op, $a3 = NULL, $a4 = NULL) {
switch($op) {
case "view":
$node->body = str_replace($expected_link, $desired_link);
break;
}
}
where you define desired link somewhere else. Not an optimal solution, but non-Javascript browsers (yes, they still exist!) can't get around the forced, desired URLs if you try to change it with Javascript.
I think doing it in Drupal/PHP would be cleaner. Check out Pathauto module: http://drupal.org/node/17345
There is a good discussion on a related topic here:
http://drupal.org/node/249864
This wouldn't use jquery (instead you would overwrite a function using PHP) but you could get the same result. This assumes, however, that you are working with menu links.
I think you should consider exploring PURL ( http://drupal.org/project/purl ) and some of the modules it works with e.g. spaces and context.
I don't suggest you use jQuery to do this. It's a better practice to do this server side in PHP (Drupal).
You can overwrite the links dynamically into your preprocess page function.
On your template.php file:
function *yourtheme*_preprocess_page(&$vars, $hook){
//You can do it for the region that you need
$vars['content'] = preg_replace('/<a href="(.*?)"/i', '<a href="$1&attr=1234"', $vars['content']);
}
Note:
I did not try it, its only a hint.
It will add your parameters to the outside links too.

Categories