Parse between slashes within URL - php

I'm trying to parse two numbers within a URL. The URL is here:
http://movies.actionpaxed.com/5600_5949/5943/5/pics/none/500k/3min/003.jpg?nvb=20130811232301&nva=20130812012301&hash=090a687f7e27b2f5ef735
I'm trying to only get the "5943/5" portion of the URL. I would just parse the URL, then use str_replace, but the folders around the two I need, vary in name.
So far I have:
$homepage = file_get_contents($url);
$link = parse_to_string('"video_url":"', '"};', $homepage);
$link = str_replace(array( '"low":"', '"};'), '', $link);
$link = utf8_decode(urldecode($link));
At the end of this code, $link = http://movies.actionpaxed.com/5600_5949/5943/5/pics/none/500k/3min/003.jpg?nvb=20130811232301&nva=20130812012301&hash=090a687f7e27b2f5ef735
Any help with the regex expression that can take care of this for me, would be greatly appreciated!

How about:
$res = explode('/', parse_url($url, PHP_URL_PATH));
$res = $res[2].'/'.$res[3];
echo $res;
Demo!

$exploded = explode("/", $link);
$res = $exploded[4] . "/" . $exploded[5];
echo $res;

preg_match('%https?://.*?/\d*_\d*/(\d*)/(\d*)%',$link,$matches);
print_r($matches);

Here is a function that extracts what you are looking for.
function getTheStuff($url) {
// Only get the part of the URL that
// actually matters; this makes the
// problem smaller and easier to solve
$path = parse_url($url, PHP_URL_PATH);
// The path will be false if the URL is
// malformed, or null if it was not found
if ($path !== false && $path !== null) {
// Assuming that the stuff you need is
// always after the first forward slash,
// and that the format never changes,
// it should be easy to match
preg_match('/^\/[\d_]+\/(\d+\/\d+)/', $path, $result);
// We only capture one thing so what we
// are looking for can only be the second
// thing in the array
if (isset($result[1])) {
return $result[1];
}
}
// If it is not in the array then it
// means that it was not found
return false;
}
$url = 'http://movies.actionpaxed.com/5600_5949/5943/5/pics/none/500k/3min/003.jpg?nvb=20130811232301&nva=20130812012301&hash=090a687f7e27b2f5ef735';
var_dump(getTheStuff($url));
If I were writing this for myself then I would have avoided the regular expression. It is the easiest in this case, so I used it. I would probably have generalized the solution by tokenizing the $path (using / as a delimiter), and then let another function/method/mechanism handle extracting the parts that are needed. That way it would be easier to adopt it for other URLs that are formatted differently.

Related

Using preg_replace need to replace random directory in an image src

I have been struggling with this now for 2 hours and it's driving me nuts. And I don't think it is likely hard. I am using Wordpress and need to replace the IMG urls from an old path to a new path. Problem is..everything about the url is static except a particular directory which is random.
Example:
https://cdn2.content.mysite.com/uploads/user/76eb326b-62ff-4d37-bf4b-01a428e2f9f6/0ffd6c15-8a13-437c-9661-36edfe11cb41/Image/b1493cd89a29c0a2d1d8e0939f05d8ee/booth_w640.jpeg
should become
/wp-content/uploads/imports/booth_w640.jpeg
The bold part is random. So I have this in my wordpress functions.php
function replace_content($content) {
$reg = '#/https://cdn2.content.mysite.com/uploads/user/76eb326b-62ff-4d37-bf4b-01a428e2f9f6/0ffd6c15-8a13-437c-9661-36edfe11cb41/Image/([^/]+)#i';
$rep = '/wp-content/uploads/imports';
$content = preg_replace($reg, $rep ,$content);
return $content;
}
add_filter('the_content','replace_content');
but that isn't working. I can't figure it out. Any help?
I think what you need is:
function replace_content($content) {
$reg = '#/static-part-of-url/([^/]+)#i';
$rep = '/wp-content/uploads/imports';
$content = preg_replace($reg, $rep ,$content);
return $content;
}
add_filter('the_content','replace_content');
Using a different delimiter than / is better when trying to match URLs.
Here is the script in action.
I determined the answer to my problem using the below code
preg_match( '/src="([^"]*)"/i', $content, $match ) ;
$getURL = $match[1];
$urlArr = explode("/",$getURL);
$fileName = end($urlArr);
$newURL = "/blog/wp-content/uploads/imports/" . $fileName;
$content = str_replace($getURL, $newURL, $content);

Php parse string error

I am extracting files from a string which can be entered by a user or taken from reading a page source.
I want to extract all .jpg image URLs
So, I am using the following (example text shown) but a) it only returns the first one and b) it misses off '.jpg'
$word1='http://';
$word2='.jpg';
$contents = 'uuuuyyyyyhttp://image.jpgandagainhereitishttp://image2.jpgxxxxcccffff';
$between=substr($contents, strpos($contents, $word1), strpos($contents, $word2) - strpos($contents, $word1));
echo $between;
Is there maybe a better way to do this?
In the case of parsing a web page I cannot use a simple DOM e.g. $images = $dom->getElementsByTagName('img'); as sometimes the image references are not in standard tags
You can do something like this :
<?php
$contents = 'uuuuyyyyyhttp://image.jpgandagainhereitishttp://image2.jpgxxxxcccffff';
$matches = array();
preg_match_all('#(http://[^\s]*?\.jpg)#i',$matches);
print_r($matches);
You can either do this using preg_match_all (as previously answered) or alternatively use the following function.
It simply explodes the original string, checks all parts for a valid link and adds it to the array, that's getting returned.
function getJpgLinks($string) {
$return = array();
foreach (explode('.jpg', $string) as $value) {
$position = strrpos($value, 'http://');
if ($position !== false) {
$return[] = substr($value, $position) . '.jpg';
}
}
return $return;
}

php - file_get_contents - Downloading files with spaces in the filename not working

I am trying to download files using file_get_contents() function.
However if the location of the file is http://www.example.com/some name.jpg, the function fails to download this.
But if the URL is given as http://www.example.com/some%20name.jpg, the same gets downloaded.
I tried rawurlencode() but this coverts all the characters in the URL and the download fails again.
Can someone please suggest a solution for this?
I think this will work for you:
function file_url($url){
$parts = parse_url($url);
$path_parts = array_map('rawurldecode', explode('/', $parts['path']));
return
$parts['scheme'] . '://' .
$parts['host'] .
implode('/', array_map('rawurlencode', $path_parts))
;
}
echo file_url("http://example.com/foo/bar bof/some file.jpg") . "\n";
echo file_url("http://example.com/foo/bar+bof/some+file.jpg") . "\n";
echo file_url("http://example.com/foo/bar%20bof/some%20file.jpg") . "\n";
Output
http://example.com/foo/bar%20bof/some%20file.jpg
http://example.com/foo/bar%2Bbof/some%2Bfile.jpg
http://example.com/foo/bar%20bof/some%20file.jpg
Note:
I'd probably use urldecode and urlencode for this as the output would be identical for each url. rawurlencode will preserve the + even when %20 is probably suitable for whatever url you're using.
As you have probably already figured out urlencode() should only be used on each portion of a URL that requires escaping.
From the docs for urlencode() just apply it to the image file name giving you the problem and leave the rest of the URL alone. From your example you can safely encode everything following the last "/" character
Here is maybe a better solution. If for any reason you are using a relative url like:
//www.example.com/path
Prior to php 5.4.7 this would not create the [scheme] array element which would throw off maček function. This method may be faster as well.
$url = '//www.example.com/path';
preg_match('/(https?:\/\/|\/\/)([^\/]+)(.*)/ism', $url, $result);
$url = $result[1].$result[2].urlencode(urldecode($result[3]));
Assuming only the file name has the problem, this is a better approach. only urlencode the last section ie. file name.
private function update_url($url)
{
$parts = explode('/', $url);
$new_file = urlencode(end($parts));
$parts[key($parts)] = $new_file;
return implode("/", $parts);
}
This should work
$file = 'some file name';
urlencode($file);
file_get_contents($file);

Get last word from URL after a slash in PHP

I need to get the very last word from an URL. So for example I have the following URL:
http://www.mydomainname.com/m/groups/view/test
I need to get with PHP only "test", nothing else. I tried to use something like this:
$words = explode(' ', $_SERVER['REQUEST_URI']);
$showword = trim($words[count($words) - 1], '/');
echo $showword;
It does not work for me. Can you help me please?
Thank you so much!!
Use basename with parse_url:
echo basename(parse_url($_SERVER['REQUEST_URI'], PHP_URL_PATH));
by using regex:
preg_match("/[^\/]+$/", "http://www.mydomainname.com/m/groups/view/test", $matches);
$last_word = $matches[0]; // test
I used this:
$lastWord = substr($url, strrpos($url, '/') + 1);
Thnx to: https://stackoverflow.com/a/1361752/4189000
You can use explode but you need to use / as delimiter:
$segments = explode('/', $_SERVER['REQUEST_URI']);
Note that $_SERVER['REQUEST_URI'] can contain the query string if the current URI has one. In that case you should use parse_url before to only get the path:
$_SERVER['REQUEST_URI_PATH'] = parse_url($_SERVER['REQUEST_URI'], PHP_URL_PATH);
And to take trailing slashes into account, you can use rtrim to remove them before splitting it into its segments using explode. So:
$_SERVER['REQUEST_URI_PATH'] = parse_url($_SERVER['REQUEST_URI'], PHP_URL_PATH);
$segments = explode('/', rtrim($_SERVER['REQUEST_URI_PATH'], '/'));
To do that you can use explode on your REQUEST_URI.I've made some simple function:
function getLast()
{
$requestUri = $_SERVER['REQUEST_URI'];
# Remove query string
$requestUri = trim(strstr($requestUri, '?', true), '/');
# Note that delimeter is '/'
$arr = explode('/', $requestUri);
$count = count($arr);
return $arr[$count - 1];
}
echo getLast();
If you don't mind a query string being included when present, then just use basename. You don't need to use parse_url as well.
$url = 'http://www.mydomainname.com/m/groups/view/test';
$showword = basename($url);
echo htmlspecialchars($showword);
When the $url variable is generated from user input or from $_SERVER['REQUEST_URI']; before using echo use htmlspecialchars or htmlentities, otherwise users could add html tags or run JavaScript on the webpage.
use preg*
if ( preg_match( "~/(.*?)$~msi", $_SERVER[ "REQUEST_URI" ], $vv ))
echo $vv[1];
else
echo "Nothing here";
this was just idea of code. It can be rewriten in function.
PS. Generally i use mod_rewrite to handle this... ans process in php the $_GET variables.
And this is good practice, IMHO
ex: $url = 'http://www.youtube.com/embed/ADU0QnQ4eDs';
$url = "http://$_SERVER[HTTP_HOST]$_SERVER[REQUEST_URI]";
$url_path = parse_url($url, PHP_URL_PATH);
$basename = pathinfo($url_path, PATHINFO_BASENAME);
// **output**: $basename is "ADU0QnQ4eDs"
complete solution you will get in the below link. i just found to Get last word from URL after a slash in PHP.
Get last parameter of url in php

Using PHP to find part of a URL

Take this domain:
http://www.?.co.uk/elderly-care-advocacy/mental-capacity-act-advance-medical-directive.html
How could i use PHP to find the everything between the first and second slash regardless of whether it changes or no?
Ie. elderly-care-advocacy
Any helo would be greatly appreciated.
//strip the "http://" part. Note: Doesn't work for HTTPS!
$url = substr("http://www.example.com/elderly-care-advocacy/mental-capacity-act-advance-medical-directive.html", 7);
// split the URL in parts
$parts = explode("/", $url);
// The second part (offset 1) is the part we look for
if (count($parts) > 1) {
$segment = $parts[1];
} else {
throw new Exception("Full URLs please!");
}
$url = "http://www.example.co.uk/elderly-care-advocacy/mental-capacity-act-advance-medical-directive.html";
$parts = parse_url($url);
$host = $parts['host'];
$path = $parts['path'];
$items = preg_split('/\//',$path,null,PREG_SPLIT_NO_EMPTY);
$firstPart = $items[0];
off the top of my head:
$url = http://www.example.co.uk/elderly-care-advocacy/mental-capacity-act-advance-medical-directive.html
$urlParts = parse_url($url); // An array
$target_string = $urlParts[1] // 'elderly-care-advocacy'
Cheers
explode('/', $a);
All you should do, is parse url first, and then explode string and get first part. With some sanity checks that would lok like following:
$url = 'http://www.?.co.uk/elderly-care-advocacy/mental-capacity-act-advance-medical-directive.html';
$url_parts = parse_url($url);
if (isset($url_parts['path'])) {
$path_components = explode('/', $ul_parts['path']);
if (count($path_components) > 1) {
// All is OK. Path's first component is in $path_components[0]
} else {
// Throw an error, since there is no directory specified in path
// Or you could assume, that $path_components[0] is the actual path
}
} else {
// Throw an error, since there is no path component was found
}
I was surprised too, but this works.
$url='http://www.?.co.uk/elderly-care-advocacy/...'
$result=explode('/',$url)[3];
I think a Regular Expression should be fine for that.
Try using e.g.: /[^/]+/ that should give you /elderly-care-advocacy/ as the second index of an array in your example.
(The first string is /www.?.com/)
Parse_URL is your best option. It breaks the URL string down into components, which you can selectively query.
This function could be used:
function extract_domain($url){
if ($url_parts = parse_url($url), $prefix = 'www.', $suffix = '.co.uk') {
$host = $url_parts['host'];
$host = str_replace($prefix,'',$host);
$host = str_replace($suffix,'',$host);
return $host;
}
return false;
}
$host_component = extract_domain($_SERVER['REQUEST_URI']);

Categories