Stripping parts of a directory path using php

Stripping parts of a directory path using php - php

I am looking to strip away a part of the following url yet have no experience with regex or if that's even what I would use.
I have this url:
/var/www/wordpress/wp-content/themes/Aisis-Framework/CoreTheme/AdminPanel/Template/Form/Update.php
I would like to strip away everything to form:
CoreTheme/AdminPanel/Template/Form/Update.php
Is there an easy way to do this, and one that is done such that the amount of content before "CoreTheme" could be x characters long, where x is any number.
It should also not match on the word CoreTheme as it might be any name, it should also not match on Aisis-Framework as that could also be any name...
how ever it is safe to assume that anything after CoreTheme is static. The above string will be turned into, using string replace:
CoreTheme_AdminPanel_Template_Form_Update.php
As I have done in this piece of code:
$class_name = str_replace('/', '_', $path . $name);
where path is, in my solution, CoreTheme/AdminPanel/Template/Form/Update.php and $name is Update

If "/var/www/wordpress/wp-content/themes/Aisis-Framework/" is constant I would just do:
str_replace("/var/www/wordpress/wp-content/themes/Aisis-Framework/","",$path);
Otherwise you would need something constant to get a pattern of, like the number of directories deep or a specific pattern to match.

Related

Finding the correct permutation of spaces and underscores in a string, in PHP

I have to parse a XFERLOG log file of all the files being written to disk, and process the said files with an external script. The issue with XFERLOG is that it replaces all spaces with underscores, while the filename on disk remains unchanged (as it should be).
If the original filename has a mix of spaces and underscores, this situation makes it difficult to determine the actual filename on disk, so one would have to loop through all the permutations of spaces and underscores, check each permutation again the filesystem to see if it exists.
So lets say the logfile reads this:
/path/to/file/OCD_Nightmare_-_[stuff_here_2].txt
The actual file on disk looks like this:
/path/to/file/OCD Nightmare - [stuff_here 2].txt
There is 2^5 permutations here. What would be the best course of action to find the "right" string?

Possibly use str_replace for this:
if(str_replace('_', ' ', $filename) == str_replace('_', ' ', $logfilename))
{
//Yay, a match!
}
Note: As mentioned in a comment below, if your filesystem has /path/to/file/OCD_Nightmare_-_[stuff_here_2].txt and /path/to/file/OCD_Nightmare -_[stuff here_2].txt, they will both match the log entry of /path/to/file/OCD Nightmare - [stuff_here 2].txt, possibly resulting in unwanted behavior. I believe this may be a very unlikely situation, but still worth noting.

Regex to match base name of files with multiple extensions

I'm trying to match files of the following structure in PHP.
Input:
filename.ext1
filename.ext1.ext2
filename.ext3.ext2.ext1
filename.ext4.ext2.ext1.ext4
file name with spaces and no way of knowing how long.ext1
file name with spaces and no way of knowing how long.ext1.ext2
file name with spaces and no way of knowing how long.ext2.ext1.ext3
file name with spaces and no way of knowing how long.ext3.ext1.ext4.ext3
Output:
filename
filename
filename
filename
file name with spaces and no way of knowing how long
file name with spaces and no way of knowing how long
file name with spaces and no way of knowing how long
file name with spaces and no way of knowing how long
What I've already attempted (doesn't work of course and I already understand why):
^(?P<basename>.*)(\.ext4)|(\.ext3)|(\.ext2)|(\.ext1).*$
I'd like to extract the base name of the file and basically strip all extensions, because there's no way of knowing in which order they may appear. I've tried several solutions presented here but they did not work for me. The extensions could be anything alphanumeric of any length.
I'm fairly new to regular expressions and am confused that apparently you cannot simply search forward to the first dot and remove it including everything that comes after.
To learn, I'd also love to see how to do the reverse and just match all the extensions including the first dot.
Update:
I didn't think about file names that contain dots. So obviously my thinking regarding "searching forward" is flawed. Does anyone have a solution for the case
file name with spaces and no. way of knowing how long.ext3.ext1.ext4.ext3
or even
file name with spaces and no way of knowing.how.long.ext3.ext1.ext4.ext3
The latter one would quite possibly only work when certain extensions are given. So please assume ext1-4 are given but are in an unpredictable sequence.

Quick and dirty:
preg_replace("/\.(ext1|ext2|ext3|ext4)/i", "", $filename)

There's no need to use regular expressions for this; PHP has the buildin function basename() for that

Does something simple like this works for you....
^[^.]*
Basically it just matches string before first dot.

This regex should work for you:
^.+?(?=\.[^.]*$)
Online Demo: http://regex101.com/r/uT2oK5
This will find file names before very last dot only. See all the examples included in the link.

am confused that apparently you cannot simply search forward to the first dot and remove it including everything that comes after.
Since regexes are read from left to right, looking for a single dot will lead you straight to the first dot. That said, you would thus be able to use:
preg_replace("/\..*/", "", $filename);
.* matches any characters except newlines.
If the filename has dots, this obviously won't work, since part of the filename will then be removed.
As per update, if you have the specific extensions, you can use something like this:
preg_replace("/(?:\.ext[1-4])+$/m", "", $filename);
regex101 demo
In a broader perspective, you could use something like this if you have an array of extensions at your disposition:
$exts = array(".ext1", ".ext2", ".ext3", ".ext4");
$result = preg_replace("/(?:". preg_quote(join("|",$exts)) .")+$/m", "", $filename);

.*(?=\.)
Try this? Will match all before the last dot even if theres a dot in the file name

This is easy with just plain old php functions. No need for fancy regex.
$name = substr($filename, 0, strpos($filename, '.'));
This won't work for filenames which have a . like your updated example, however in order to achieve this you would likely need to know in advance the extensions which you are likely to encounter.

Slugs for SEO using PHP - Appending name to end of URL

Something I have noticed on the StackOverflow website:
If you visit the URL of a question on StackOverflow.com:
"https://stackoverflow.com/questions/10721603"
The website adds the name of the question to the end of the URL, so it turns into:
"https://stackoverflow.com/questions/10721603/grid-background-image-using-imagebrush"
This is great, I understand that this makes the URL more meaningful and is probably good as a technique for SEO.
What I wanted to Achieve after seeing this Implementation on StackOverflow
I wish to implement the same thing with my website. I am happy using a header() 301 redirect in order to achieve this, but I am attempting to come up with a tight script that will do the trick.
My Code so Far
Please see it working by clicking here
// Set the title of the page article (This could be from the database). Trimming any spaces either side
$original_name = trim(' How to get file creation & modification date/times in Python with-dash?');
// Replace any characters that are not A-Za-z0-9 or a dash with a space
$replace_strange_characters = preg_replace('/[^\da-z-]/i', " ", $original_name);
// Replace any spaces (or multiple spaces) with a single dash to make it URL friendly
$replace_spaces = preg_replace("/([ ]{1,})/", "-", $replace_strange_characters);
// Remove any trailing slashes
$removed_dashes = preg_replace("/^([\-]{0,})|([\-]{2,})|([\-]{0,})$/", "", $replace_spaces);
// Show the finished name on the screen
print_r($removed_dashes);
The Problem
I have created this code and it works fine by the looks of things, it makes the string URL friendly and readable to the human eye. However, it I would like to see if it is possible to simplify or "tightened it up" a bit... as I feel my code is probably over complicated.
It is not so much that I want it put onto one line, because I could do that by nesting the functions into one another, but I feel that there might be an overall simpler way of achieving it - I am looking for ideas.
In summary, the code achieves the following:
Removes any "strange" characters and replaces them with a space
Replaces any spaces with a dash to make it URL friendly
Returns a string without any spaces, with words separated with dashes and has no trailing spaces or dashes
String is readable (Doesn't contain percentage signs and + symbols like simply using urlencode()
Thanks for your help!
Potential Solutions
I found out whilst writing this that article, that I am looking for what is known as a URL 'slug' and they are indeed useful for SEO.
I found this library on Google code which appears to work well in the first instance.
There is also a notable question on this on SO which can be found here, which has other examples.

I tried to play with preg like you did. However it gets more and more complicated when you start looking at foreign languages.
What I ended up doing was simply trimming the title, and using urlencode
$url_slug = urlencode($title);
Also I had to add those:
$title = str_replace('/','',$title); //Apache doesn't like this character even encoded
$title = str_replace('\\','',$title); //Apache doesn't like this character even encoded
There are also 3rd party libraries such as: http://cubiq.org/the-perfect-php-clean-url-generator

Indeed, you can do that:
$original_name = ' How to get file creation & modification date/times in Python with-dash?';
$result = preg_replace('~[^a-z0-9]++~i', '-', $original_name);
$result = trim($result, '-');
To deal with other alphabets you can use this pattern instead:
~\P{Xan}++~u
or
~[^\pL\pN]++~u

decoding Timestamps explicitly encoded in a filename

Quite a while ago I decided to change the way filenames are formatted when users upload a file to my website(which was a great idea!) to include the UNIX_Timestamp in the filename, this proved very useful after unfortunately the timestamps associated with files in my database was wiped, the filename structure has changed considerably and I am currently trying to recover the valid timestamp from the filename, however so far I have only been able to recover half.
Beforehand, I structured the filename to be like this:
dcd1322318879.png - Letters prepended by the unix timestamp.
Then, I allowed people to start logging in and made the filename much more Human Friendly, they now read as follows;
username-09092012-183422.jpeg where username is the logged in users name. The problem I am having now is the fact that firstly, I am awful with regex, for the previous filenames(without the user data and human friendly date) my code was:
$newDate = preg_replace("/[^0-9]/","", $data['name']);
However obviously that will not work now as usernames can contain numbers, and therefore that number will also be included in the final number for the date.
I am currently wondering if there is anyway I can resolve this issue, I guess the fact I added hyphens(-) will be useful as I can use it as a separator, however I have no idea where to start. I need to just get the last two parts of the filename, which contain the date & time so I can then convert them into unix timestamps.
Any help is greatly appreciated

There you go, yet another way, including translation to unix timestamp:
preg_match_all('/[0-9]+/', $string, $matches);
$d = DateTime::createFromFormat('dmY His', $matches[0][0].' '.$matches[0][1]);
$timestamp = $d->getTimestamp();

Try this:
preg_match_all("/(\d{8})-(\d+)/", $str, $matches);

the most easiest way is using
$array = explode ("-", $filename);
print_r ($array);
unless you really want to go with regex, but I dont think its needed since the system is the one doing all the creating. Meaning there will be no user inputs... it can also depends on how you have your script setup.

From your question, I assume that the hyphens can be used as the structure for the regex. That is, usernames and dates won't contain them.
So then, this pretty simple pattern would work:
/(.*?)-(.*?)-(.*)/
Group 1 will contain the username, group 2 contains the date, and group 3 contains the filename (note that the filename (everything after the third hyphen) can be anything at all, including more hyphens).
(the .*? is the dot, quantified at 'zero or more times' by the *, and the ? makes the quantifier lazy, so that the . doesn't just greedily race to the end of the string)

To parse all three components from your file name, try this:
<?php
preg_match(
"/^([\w]+)-(\d{8})-(\d{6})/",
"username-09092012-183422.jpeg",
$matches);
echo "user=[{$matches[1]}]<br/>date=[{$matches[2]}]<br/>time=[{$matches[3]}]<br/>";
?>
Which produces:
user=[username]
date=[09092012]
time=[183422]
You might also consider incorporating the date in your file names as YYYYMMDD. For example:
'09092012' => '20120909'
This will make it easier to sort your data by date should you ever wish to do so.

extracting one or more urls from a string in php

I'm trying to extract one or more urls from a plain text string in php. Here's some examples
"mydomain.com has hit the headlines again"
extract " http://www.mydomain.com"
"this is 1 domain.com and this is anotherdomain.co.uk but sometimes http://thirddomain.net"
extract "http://www.domain.com" , "http://www.anotherdomain.co.uk" , "http://www.thirddomain.net"
There are two special cases I need - I'm thinking regex, but dont fully understand them
1) all symbols like '(' or ')' and spaces (excluding hyphens) need to be removed
2) the word dot needs to be replaced with the symbol . , so dot com would be .com
p.s I'm aware of PHP validation/regex for URL but cant work out how I would use this to achieve the end goal.
Thanks

In this case it will be hard to get 100% correct results.
Depending on the input you may try to force matching just most popular first level domains (add more to it):
(?:https?://)?[a-zA-Z0-9\-\.]+\.(?:com|org|net|biz|edu|uk|ly|gov)\b
You may need to remove the word boundary (\b) to get different results.
You can test it here:
http://bit.ly/dlrgzQ
EDIT: about your cases
1) remove from what?
2) this could be done in php like:
$result = preg_replace('/\s+dot\s+(?=(com|org|net|biz|edu|and_ect))/', '.', $input);
But I have few important notes:
This Regex are more like guidance, not actual production code
Working with this kind of loose rules on text is wacky for the least - and adding more special cases will make it even more looney. Consider this - even stackoverflow doesn't do that:
http://example.org
but not!
example.org
It would be easier if you'd said what are you trying to achieve? Because if you want to process some kind of text that goes somewhere on the WWW later, then it is very bad idea! You should not do this by your own (as you said - you don't understand Regex!), as this would be just can of XSS worms. Better think about some kind of Markdown language or BBCore or else.
Also get interested in: http://htmlpurifier.org/

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.