I'm working on converting a website. It involved standardizing the directory structure of images and media files. I'm parsing path information from various tags, standardizing them, checking to see if the media exists in the new standardized location, and putting it there if it doesn't. I'm using string manipulation to do so.
This is a little open-ended, but is there a class, tool, or concept out there I can use to save myself some headaches? For instance, I'm running into problems where, say, a page in a sudirectory (website.com/subdir/dir/page.php) has relative image paths (../images/image.png), or other kinds of things like this. It's not like there's one overarching problem, but just a lot of little things that add up.
When I think I've got my script covering most cases, then I get errors like Could not find file at export/standardized_folder/proper_image_folderimage.png where it should be export/standardized_folder/proper_image_folder/image.png. It's kind of driving me mad, doing string parsing and checks to make sure that directory separators are in the proper places.
I feel like I'm putting too much work into making a one-off import script very robust. Perhaps someone's already untangled this mess in a re-useable way, one which I can take advantage of?
Post Script: So here's a more in-depth scoop. I write my script that parses one "type" of page and pulls content from the same of its kind. Then I turn my script to parse another type of page, get all knids of errors, and learn that all my assumptions about how paths are referenced must be thrown out the window. Wash, rinse, repeat.
So I'm looking at doing some major re-factoring of my script, throwing out all assumptions, and checking, re-checking, and double-checking path information. Since I'm really trying to build a robust path building script, hopefully I can avoid re-inventing the wheel. Is there a wheel out there?
If your problems have their root in resolving the relative links from a document and resolve to an absolute one (which should be half the job to map the linked images paths onto the file-system), I normally use Net_URL2 from pear. It's a simple class that just does the job.
To install, as root just call
# pear install channel://pear.php.net/Net_URL2-0.3.1
Even if it's a beta package, it's really stable.
A little example, let's say there is an array with all the images srcs in question and there is a base-URL for the document:
require_once('Net/URL2.php');
$baseUrl = 'http://www.example.com/test/images.html';
$docSrcs = array(...);
$baseUrl = new Net_URL2($baseUrl);
foreach($docSrcs as $href)
{
$url = $baseUrl->resolve($href);
echo ' * ', $href, ' -> ', $url->getURL(), "\n";
// or
echo " $href -> $url\n"; # Net_URL2 supports string context
}
This will convert any relative links into absolute ones based on your base URL. The base URL is first of all the documents address. The document can override it by specifying another one with the base elementDocs. So you could look that up with the HTML parser you're already using (as well as the src and href values).
Net_URL2 reflects the current RFC 3986 to do the URL resolving.
Another thing that might be handy for your URL handling is the getNormalizedURL function. It does remove some potential error-cases like needless dot segments etc. which is useful if you need to compare one URL with another one and naturally for mapping the URL to a path then:
foreach($docSrcs as $href)
{
$url = $baseUrl->resolve($href);
$url = $url->getNormalizedURL();
echo " $href -> $url\n";
}
So as you can resolve all URLs to absolute ones and you get them normalized, you can decide whether or not they are in question for your site, as long as the url is still a Net_URL2 instance, you can use one of the many functions to do that:
$host = strtolower($url->getHost());
if (in_array($host, array('example.com', 'www.example.com'))
{
# URL is on my server, process it further
}
Left is the concrete path to the file in the URL:
$path = $url->getPath();
That path, considering you're comparing against a UNIX file-system, should be easy to prefix with a concrete base directory:
$filesystemImagePath = '/var/www/site-new/images';
$newPath = $filesystemImagePath . $path;
if (is_file($newPath))
{
# new image already exists.
}
If you've got problems to combine the base path with the image path, the image path will always have a slash at the beginning.
Hope this helps.
Truepath() to the rescue!
No, you shouldn't use realpath() (see why).
Related
I'm designing a web application that can be customized based on which retail location the end user is coming from. For example, if a user is coming from a store called Farmer's Market, there may be customized content or extra links available to that user, specific to that particular store. file_exists() is used to determine if there are any customized portions of the page that need to be imported.
Up until now, we've been using a relatively insecure method, in which the item ID# and the store are simply passed in as GET parameters, and the system knows to apply them to each of the links within the page. However, we're switching to a reversible hash method, in which the store and item number are encrypted (to look something like "gd651hd8h41dg0h81"), and the pages simply decode them and assign the store and ID variables.
Since then, however, we've been running into an error that Googling extensively hasn't found me an answer for. There are several similar blocks of code, but they all look something like this:
$buttons_first = "../stores/" . $store . "/buttons_first.php";
if(file_exists($buttons_first))
{
include($buttons_first);
}
(The /stores/ directory is actually in the directory above the working one, hence the ../)
Fairly straightforward. But despite working fine when a regular ID and store is passed in, using the encrypted ID throws this error for each one of those similar statements:
Warning: file_exists() expects parameter 1 to be a valid path, string given in [url removed] on line 11
I've had the script spit back the full URL, and it appears to be assigning $store correctly. I'm running PHP 5.4.11 on 1&1 hosting (because I know they have some abnormalities in the way their servers work), if that helps any.
I got the same error before but I don't know if this solution of mine works on your problem you need to remove the "\0" try replace it:
$cleaned = strval(str_replace("\0", "", $buttons_first));
it worked on my case.
Run a var_dump(strpos($buttons_first,"\0")), this warning could come up when a path has a null byte, for security reasons. If that doesn't work, check the length of the string and make sure it is what you'd expect, just in case there are other invisible bytes.
It may be a problem with the path as it depends where you are running the script from. It's safer to use absolute paths. To get the path to the directory in which the current script is executing, you can use dirname(__FILE__).
Add / before stores/, you are better off using absolute paths.
I know this post was created on 2013 but didn't saw the common solution.
This error occurs after adding multiple to the file submit form
for example you are using files like this on php: $_FILES['file']['tmp_name']
But after the adding multiple option to the form. Your input name became file => file[]
so even if you post just one file, $_FILES['file']['tmp_name'] should be change to $_FILES['file']['tmp_name'][0]
phpThumb is a PHP library that converts large images to image thumbnails and caches the result. It takes such a syntax: http://domain.com/phpThumb.php?src=/images/image.jpg
However in my web application I'm following a strict MVC architecture, so I changed the syntax to this: http://domain.com/thumb/images%2Fimage.jpg/width/height
However now the output image is now complaining
Usage: /workspace/urs/index.php?src=/path/and/filename.jpg
Even though I've checked the $_GET dump and it reads:
array(1) {
["src"]=>
string(42) "/workspace/urs/images/portfolio/shoopm.jpg"
}
This is the code that runs up to the error (in my web application):
// If getting a thumbnail
if($qa[0] == "thumb")
{
if(!isset($qa[1]) || !isset($qa[2]) || !isset($qa[3]))
die("Bad thumb request. Needs 3 parameters!");
unset($_GET["q"]);
$_GET["src"] = $qa[1];
$_GET["w"] = $qa[2];
$_GET["h"] = $qa[3];
include("phpThumb/phpThumb.php");
exit();
}
Now, what I'm fearing is that phpThumb checks the actual URL, and not just the $_GET parameters... It's hard to confirm since the source contains thousands and thousands of lines of code and I haven't a clue where to start.
Thanks for any helpful replies
Judging from reading some of the source, it looks like it tries to do it's own PATH_INFO parsing. You can prevent this by either changing the disable_pathinfo_parsing config variable, or setting $_SERVER['PATH_INFO'] to null.
This may be important because the "Usage: ..." error happens only when the src attribute of the $phpThumb object is empty. It populates this attribute by looking for it in $_GET, and it does some pretty serious $_GET['src'] manipulation when it tries to process PATH_INFO.
In the alternative, you might want to try just using it's own native PATH_INFO-based URLs instead of your own, just to avoid the futzing.
I had a similar issue when working on a windows system.
My src paths had a mix of forward / and back \ slashes in the so I was converting them all o PHP's DIRECTORY_SEPARATOR constant (a backslash in windows).
When I converted them all to a forward slash it just worked
I really have read the other articles that cover this subject. But I seem to be in a slightly different position. I'm not using modrewrite (other articles).
I would like to 'include' a webpage its a 'Joomla php' generated page inside a php script. I'd hoped to make additions on the 'fly' without altering the original script. So I was going to 'precomplete' elements of the page by parasing the page once it was included I hadent wanted to hack the original script. To the point I can't include the file and its not because the path is wrong -
so
include ("/home/public_html/index.php"); this would work
include ("/home/public_html/index.php?option=com_k2&view=item&task=add"); this would not!
I've tried a variety of alternates, in phrasing, I can't use the direct route "http:etc..." since its a current php version so must be a reference to the same server. I tried relative, these work without the ?option=com_k2&view=item&task=add
It may be the simple answer that 'options' or variables can be passed.
Or that the include can't be used to 'wait' for a page to be generated - i.e. it will only return the html.
I'm not the biggest of coders but I've done alot more than this and I thought this was so basic.
this would work include ("/home/public_html/index.php?option=com_k2&view=item&task=add"); this would not!
And it never will: You are mixing a filesystem path with GET parameters, which can be passed only through the web server (utilizing a http:// call... But that, in turn, won't run the PHP code the way you want.)
You could set the variables beforehand:
$option = "com_k2";
$view = "item";
$task = "add";
include the file the normal way:
include ("/home/public_html/index.php");
this is assuming that you have access to the file, and can change the script to expect variables instead of GET parameters.
I have been working on a content management system (nakid) and one of my toughest challenges is the file navigation. I want to make sure the file paths and settings work on local and remote servers. Right now my setup is pretty much something like this:
first.php (used by all pages):
//Set paths to nakid root
$core['dir_cur'] = dirname(__FILE__);
$core['dir_root'] = $_SERVER['DOCUMENT_ROOT'];
//Detect current nakid directory
$get_dirnakid_1 = str_replace("\\","/",dirname(__FILE__));//If on local
$get_dirnakid_2 = str_replace("/includes/php","",$get_dirnakid_1);
$get_dirnakid_3 = str_replace($_SERVER['DOCUMENT_ROOT'],"",$get_dirnakid_2);
//remove first "/"
if(substr($get_dirnakid_3, 0,1) == "/"){
$get_dirnakid_3 = substr($get_dirnakid_3, 1);
}
//Set some default vars
$core['dir_nakid_path'] = $get_dirnakid_3;
$core['dir_nakid'] = $core['dir_root']."/".$core['dir_nakid_path'];//We need to get system() for this real value - below
The reason I also did it this way is because I want the directory that this program is sitting in to be anywhere on the server ie(/nakid)(/cms)(/admin/cms)
I'm positive I am doing something the wrong way or that there is a simpler way to take care of all this.
If it helps to get a closer look at the code and how everything is being used I have it all up at nakid.org
EDIT: Just realized what I have at nakid.org is a little different than my newly posted code, but the same idea still applies to what I am attempting to do.
By and large, it looks okay to me.
You might want to give the variables more speaking names (e.g. nakid_root_dir, nakid_relative_webroot and so on.)
Remember when converting \ to / in path names: Whenever you match another directory name to one of those settings, you need to str_replace("\\","/"...) in those too.
I don't understand what you aim at with $get_dirnakid_2, though. Why will you screw up my path if I install your application in a directory that happens to be named /etc/includes/php/nakid?
Anyway, you should make those settings user overwritable as well. Sometimes, the user may want to set different settings from what you get from DOCUMENT_ROOT and consorts.
I don't fully understand what you try to get, but maybe getcwd() is what you look for:
http://www.php.net/manual/en/function.getcwd.php
I'm not a PHP developer, but I'm currently hacking on an internal tool so my team can take advantage of its goodness. There's an index file that looks like so:
require( ($loader_path = "../../loaderapi/") . "loader.php" );
Used like this, $loader_path will retain its value within the loader.php file.
However, we want to access this API from our team's server like so:
require( ($loader_path = "http://remoteservername/loaderapi/") . "loader.php" );
In this case the $loader_path variable doesn't retain its value. I'm guessing it has something to do with it being a full blown URL, but I might be wrong. Any idea on how I can make this work, or why I can't do it this way?
If your accessing a PHP script over HTTP, only the output of that script is returned. So your script will try to interpret the output of that remote PHP script and not its source.
If there is a connection over the file system, you may want to try file://remoteservername/loaderapi/loader.php instead.
NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO!
Remote file inclusion is a BAD idea, probably one of the biggest security flaws you can open up. Even for an internal tool this is not acceptable even if only purely for contributing bad habits.
PHP by default disables this behavior, and there is a broad push to have the ability to perform an include on a URL completely stripped from PHP (as there is no compelling reason to have this ability).
If you want to load shared resources, go through a shared file system drive (as in, don't use http, ftp, anything but file://) or better yet distribute copies of loader.php through a version control system. Loading from a single file resource opens you up to problems in the future of say a new dev overwriting loader.php and breaking everyone else's code.
There shouldn't be any real difference between the two; what you're doing is defining $loader_path, concatenating the loader.php, and passing that to require.
HOWEVER: you're defining the variable within the scope of a require, which will halt processing of the script of require fails.
Try replacing 'require' with 'include' and see if it retains the variable.
Also, note that if you are running your PHP server on a windows machine, and the php version is less than 4.3.0, neither 'require' nor 'include' can handle remote files : http://us.php.net/manual/en/function.include.php
Also, as noted before, if the .php lives on a remote server that parses php, you will not get code, but the result of the remote server processing the code. You'll either have to serve it up as a .txt file, or write php that, when processed, outputs valid php.
Have you tried splitting it into two lines:
$loader_path = "http://remoteservername/loaderapi/";
require( $loader_path . "loader.php" );
It's easier to read this way as well.
Simplify the code reading by simply putting everything on 3 lines:
$loader_path = "http://remoteservername/loaderapi/";
$page = "loader.php";
require($loader_path . $page );
Much clearer and it works.
why not just put it above the require statement? would make it easier to read too.
<?php
$loader_path = "../../folderName/"
require($loader_path . "filename")
?>