Here's my code:
$url = "https://de.wikipedia.org/wiki/…_und_wenn_der_letzte_Reifen_platzt";
$base = basename($url);
echo $base . "<br>";
$url2 = urlencode($base);
echo $url2 . "<br>";
$url = dirname($url) . "/" . $url2;
echo $url;
$aHeader = #get_headers($url);
echo "<pre>" . print_r($aHeader,true) . "</pre>";
It works fine on my local machine (running Xampp with PHP v7.3.12) - $base encodes as %E2%80%A6_und_wenn_der_letzte_Reifen_platzt
But when running on my server, $base will encode as _und_wenn_der_letzte_Reifen_platzt which is wrong and will result in an error 404 (the server is running on PHP 7.2.24).
Any ideas what is causing this behaviour? Both scripts are encoded in UTF-8.
I could be a bug related to the basename function. Because if you mix … char with letters in und_wenn_der_letzte_Reifen_platzt part, if works as expected. You can try to upgrade your PHP on your server matching your local version if possible.
If you can't do this, there is always a better way to achieve this with regular expressions.
$re = '/.+\/(.*)/m';
$str = 'https://de.wikipedia.org/wiki/…_und_wenn_der_letzte_Reifen_platzt';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
$base = $matches[0][1];
echo $base . "<br>";
$url2 = rawurlencode($base);
echo $url2 . "<br>";
I just ran into the same problem while processing some MP3 files of French songs I listen to. I set up a webpage where I can download a M3U playlist filtered according to what I want to listen to on my phone. I simply download the playlist and it will find the songs on my phone in a MP3 folder. Problem was that basename truncated the base filenames. Frustrated, I tracked it down to the "basename" function in PHP. I found a simple solution by creating a new basename function once I realized that paths as well as URLs used the "/" as a seperator. And, it was the final "/" that defines what the base name is ...
function basename_x($url, $ext = NULL ) {
$Array_Check = TRUE;
$url = explode("/", $url);
$Array_Check = ( is_array($url) ? TRUE : FALSE );
$key = ( $Array_Check ? count($url) - 1 : NULL );
if ( $ext != NULL ) {
if ( $Array_Check ) {
$url[$key] = preg_replace( "/$ext/", '', $url[$key] );
} else {
$url = preg_replace( "/$ext/", '', $url );
}
}
$base_name = ( $Array_Check ? $url[$key] : $url );
return $base_name;
}
$sample = "./MP3s/À_ton_nom_-_Collectif_Cieux_Ouverts.mp3";
$this_doesnt_work = basename($sample);
$will_this_work = basename_x($sample);
var_dump($this_doesnt_work,$will_this_work);
From the command line, this is the output ...
string(40) "À_ton_nom_-_Collectif_Cieux_Ouverts.mp3"
string(40) "À_ton_nom_-_Collectif_Cieux_Ouverts.mp3"
But, when I ran this on my Apache Server, I got this instead ...
string(38) "_ton_nom_-_Collectif_Cieux_Ouverts.mp3"
string(40) "À_ton_nom_-_Collectif_Cieux_Ouverts.mp3"
I find it interesting that "A" in the file accounts for two characters, not one. Anyway, this approach solved my problem without having to play with my locale settings in PHP. Of course, I added the feature of removing the extension as well as insuring the URL is exploded into a true array. But, it was a quick work around with a simple solution.
Hope this helps someone with the same problem.
Related
How can I name a txt file after a site URL without the preceding https:// or http:// as in: www.google.com.txt?
I think I might be getting it wrong here: fopen($sitenameWithoutHTTP.".txt, "w");
Below is the way I'm trying to address that:
<?php
//
#mkdir("result", 0755);
#chdir("result");
$link = $sitename;
$sitename = preg_replace('#^https?://#', '', $sitenameWithoutHTTP);
$resultfile = fopen($sitenameWithoutHTTP.".txt", "w");
//
?>
Thanks for helping find a fix.
Hope this helps you achieve what you intended!
<?php
$siteName = 'https://www.google.com';
$siteNameWithoutHttps = preg_replace('#^https?://#', '', $siteName);
// print_r($siteNameWithoutHttps);
$resultFile = fopen($siteNameWithoutHttps.".txt", "w");
// run a check
if($resultFile == true) {
echo "success";
} else {
echo "failed";
}
The expected result for the commented print_r above should be:
www.google.com
$arr = ['http','https',':','/','?','&','#','.'];
$sitename = str_replace($arr, '', $sitenameWithoutHTTP);
also you can use base64_encode(), or use parse_url().
$HOST = parse_url($sitenameWithoutHTTP, PHP_URL_HOST);
and if you need to save real URL and get it again I see best way with using Md5 hash
$fileName = md5($sitenameWithoutHTTP).'.txt';
can get it again file.php/?getFile2url=[httpLink]
header('Location: '. Md5($_GET['getFile2url']).'.txt');
exit;
<span class="itemopener">82 top</span> <span class="allopener">all</span>
How can I change above to:
<span class="itemopener">top</span> <span class="allopener">82</span>
with PHP on an html file that contains around 30 of those HTML snippets.
Note: 82 can be any integer above 1.
Also, I want to run this script from a new file that I place in a directory, which will run the search and replace once for each of the 8000 HTML files in that directory (the script mustn't timeout before done - perhaps some feedback.)
i wrote function for replacement of the row:
function replace($row){
$replaced = preg_replace_callback("~(\<span class=\"itemopener\"\>)(\d{1,5})\s(top\</span\>.*\<span class=\"allopener\"\>).{3}(\</span\>)~iU", function($matches){
$str = $matches[1] . $matches[3] . $matches[2] . $matches[4];
return $str;
}, $row);
return $replaced;
}
$s = '<span class="itemopener">82 top</span> <span class="allopener">all</span>';
$replaced = replace($s);
echo "<pre>" . print_r($replaced, 1) . "</pre>";
exit();
Working demo of the function
If you would take file by one row, and do some simple check whether there is those spans you want to replace, then you can send them into this function..
But with number of files you specified, it will take some time.
For scanning of all files in path you can use my answer there: scandir
After little editing you can modify it to read only .htm files, and return to you what structure you desire..
Then you take all scanned htm files and process them with something like this:
$allScannedFiles = array("......");
foreach($allScannedFiles as $key => $path){
$file = file_get_contents($path);
$lines = explode(PHP_EOL, $file);
$modifiedFile = "";
foreach($lines as $line){
if(strpos($line, "span") && strpos($line, "itemopener")){
$line = replace($line);
}
$modifiedFile .= $line . PHP_EOL;
}
file_put_contents($path, $modifiedFile);
}
I wrote this one snippet from the head, so some testing is needed..
Then run it, go make yourself coffe and wait :)
If it will timeout, you can increase php timeout. How to do that is asked&answered here: how to increase timeout in php
alternatively you can try load files as DOMDocument and do replacements on that class documentation of DomDocument
But if in the files somewhere is not valid html, it may cause you problems..
I'm using the function created by #Jimmmy (replaced range d{2} by d{1,5} because "Note: 82 can be any integer above 1") and added the files search (tested it and works great) :
<?php
function replace($row){
$replaced = preg_replace_callback("~(\<span class=\"itemopener\"\>)(\d{1,5})\s(top\</span\>.*\<span class=\"allopener\"\>).{3}(\</span\>)~iU", function($matches){
$str = $matches[1] . $matches[3] . $matches[2] . $matches[4];
return $str;
}, $row);
return $replaced;
}
foreach ( glob( "*.html" ) as $file ) // GET ALL HTML FILES IN DIRECTORY.
{ $lines = file( $file ); // GET WHOLE FILE AS ARRAY OF STRINGS.
for ( $i = 0; $i < count( $lines ); $i++ ) // CHECK ALL LINES IN ARRAY.
$lines[ $i ] = replace( $lines[ $i ] ); // REPLACE PATTERN IF FOUND.
file_put_contents( $file,$lines ); // SAVE ALL ARRAY IN FILE.
}
?>
I have been struggling with this now for 2 hours and it's driving me nuts. And I don't think it is likely hard. I am using Wordpress and need to replace the IMG urls from an old path to a new path. Problem is..everything about the url is static except a particular directory which is random.
Example:
https://cdn2.content.mysite.com/uploads/user/76eb326b-62ff-4d37-bf4b-01a428e2f9f6/0ffd6c15-8a13-437c-9661-36edfe11cb41/Image/b1493cd89a29c0a2d1d8e0939f05d8ee/booth_w640.jpeg
should become
/wp-content/uploads/imports/booth_w640.jpeg
The bold part is random. So I have this in my wordpress functions.php
function replace_content($content) {
$reg = '#/https://cdn2.content.mysite.com/uploads/user/76eb326b-62ff-4d37-bf4b-01a428e2f9f6/0ffd6c15-8a13-437c-9661-36edfe11cb41/Image/([^/]+)#i';
$rep = '/wp-content/uploads/imports';
$content = preg_replace($reg, $rep ,$content);
return $content;
}
add_filter('the_content','replace_content');
but that isn't working. I can't figure it out. Any help?
I think what you need is:
function replace_content($content) {
$reg = '#/static-part-of-url/([^/]+)#i';
$rep = '/wp-content/uploads/imports';
$content = preg_replace($reg, $rep ,$content);
return $content;
}
add_filter('the_content','replace_content');
Using a different delimiter than / is better when trying to match URLs.
Here is the script in action.
I determined the answer to my problem using the below code
preg_match( '/src="([^"]*)"/i', $content, $match ) ;
$getURL = $match[1];
$urlArr = explode("/",$getURL);
$fileName = end($urlArr);
$newURL = "/blog/wp-content/uploads/imports/" . $fileName;
$content = str_replace($getURL, $newURL, $content);
I'm trying to parse two numbers within a URL. The URL is here:
http://movies.actionpaxed.com/5600_5949/5943/5/pics/none/500k/3min/003.jpg?nvb=20130811232301&nva=20130812012301&hash=090a687f7e27b2f5ef735
I'm trying to only get the "5943/5" portion of the URL. I would just parse the URL, then use str_replace, but the folders around the two I need, vary in name.
So far I have:
$homepage = file_get_contents($url);
$link = parse_to_string('"video_url":"', '"};', $homepage);
$link = str_replace(array( '"low":"', '"};'), '', $link);
$link = utf8_decode(urldecode($link));
At the end of this code, $link = http://movies.actionpaxed.com/5600_5949/5943/5/pics/none/500k/3min/003.jpg?nvb=20130811232301&nva=20130812012301&hash=090a687f7e27b2f5ef735
Any help with the regex expression that can take care of this for me, would be greatly appreciated!
How about:
$res = explode('/', parse_url($url, PHP_URL_PATH));
$res = $res[2].'/'.$res[3];
echo $res;
Demo!
$exploded = explode("/", $link);
$res = $exploded[4] . "/" . $exploded[5];
echo $res;
preg_match('%https?://.*?/\d*_\d*/(\d*)/(\d*)%',$link,$matches);
print_r($matches);
Here is a function that extracts what you are looking for.
function getTheStuff($url) {
// Only get the part of the URL that
// actually matters; this makes the
// problem smaller and easier to solve
$path = parse_url($url, PHP_URL_PATH);
// The path will be false if the URL is
// malformed, or null if it was not found
if ($path !== false && $path !== null) {
// Assuming that the stuff you need is
// always after the first forward slash,
// and that the format never changes,
// it should be easy to match
preg_match('/^\/[\d_]+\/(\d+\/\d+)/', $path, $result);
// We only capture one thing so what we
// are looking for can only be the second
// thing in the array
if (isset($result[1])) {
return $result[1];
}
}
// If it is not in the array then it
// means that it was not found
return false;
}
$url = 'http://movies.actionpaxed.com/5600_5949/5943/5/pics/none/500k/3min/003.jpg?nvb=20130811232301&nva=20130812012301&hash=090a687f7e27b2f5ef735';
var_dump(getTheStuff($url));
If I were writing this for myself then I would have avoided the regular expression. It is the easiest in this case, so I used it. I would probably have generalized the solution by tokenizing the $path (using / as a delimiter), and then let another function/method/mechanism handle extracting the parts that are needed. That way it would be easier to adopt it for other URLs that are formatted differently.
I am trying to download files using file_get_contents() function.
However if the location of the file is http://www.example.com/some name.jpg, the function fails to download this.
But if the URL is given as http://www.example.com/some%20name.jpg, the same gets downloaded.
I tried rawurlencode() but this coverts all the characters in the URL and the download fails again.
Can someone please suggest a solution for this?
I think this will work for you:
function file_url($url){
$parts = parse_url($url);
$path_parts = array_map('rawurldecode', explode('/', $parts['path']));
return
$parts['scheme'] . '://' .
$parts['host'] .
implode('/', array_map('rawurlencode', $path_parts))
;
}
echo file_url("http://example.com/foo/bar bof/some file.jpg") . "\n";
echo file_url("http://example.com/foo/bar+bof/some+file.jpg") . "\n";
echo file_url("http://example.com/foo/bar%20bof/some%20file.jpg") . "\n";
Output
http://example.com/foo/bar%20bof/some%20file.jpg
http://example.com/foo/bar%2Bbof/some%2Bfile.jpg
http://example.com/foo/bar%20bof/some%20file.jpg
Note:
I'd probably use urldecode and urlencode for this as the output would be identical for each url. rawurlencode will preserve the + even when %20 is probably suitable for whatever url you're using.
As you have probably already figured out urlencode() should only be used on each portion of a URL that requires escaping.
From the docs for urlencode() just apply it to the image file name giving you the problem and leave the rest of the URL alone. From your example you can safely encode everything following the last "/" character
Here is maybe a better solution. If for any reason you are using a relative url like:
//www.example.com/path
Prior to php 5.4.7 this would not create the [scheme] array element which would throw off maček function. This method may be faster as well.
$url = '//www.example.com/path';
preg_match('/(https?:\/\/|\/\/)([^\/]+)(.*)/ism', $url, $result);
$url = $result[1].$result[2].urlencode(urldecode($result[3]));
Assuming only the file name has the problem, this is a better approach. only urlencode the last section ie. file name.
private function update_url($url)
{
$parts = explode('/', $url);
$new_file = urlencode(end($parts));
$parts[key($parts)] = $new_file;
return implode("/", $parts);
}
This should work
$file = 'some file name';
urlencode($file);
file_get_contents($file);