Extracting text from URL using PHP - php

I'm curious as to how I would get a certain value after a delimiter in a URL?
If I have a URL of http://www.testing.site.com/site/biz/i-want-this, how would I extract only the part that says "i-want-this", or initially after the last /?
Thank you!

You want basename($path); It should give you what you need:
http://www.ideone.com/8hFSN

$url = "http://www.testing.site.com/site/biz/i-want-this";
preg_match( "/[^\/]*$/", $url, $match);
echo $match[0]; // i-want-this
You can use basename() but if you are on Windows, it will break on not just slashes but also backslashes. This is unlikely to come up as backslashes are unusual in a URL. But I suspect you could find them in a query string in a valid URL.

Related

PHP Regex Remove Everything After Last Character In String

I have the following string of img url's which I'm trying to sanitize
$img_string = http://image.s5a.com/is/image/saks/0401694719016_647x329.jpg," "="">/-/http://image.s5a.com/is/image/saks/0401694719016_A1_647x329.jpg," "="">
I'm exploding the string first like this
$img_array = explode('/-/', $img_string);
But I can't find a regex to remove everything after the last character in the image url.
e.g. regardless whether the img url ends in .png or .jpg or .jpeg, I need to just sanitize it.
My expected output is
http://image.s5a.com/is/image/saks/0401694719016_647x329.jpg
instead of
http://image.s5a.com/is/image/saks/0401694719016_647x329.jpg," "="">
So my question is, can someone help me with the required regex to achieve this?
Thanks
(?<=jpg|png|jpeg).*
Try this.Replace by empty string.See demo.
http://regex101.com/r/rQ6mK9/44
You can use preg_match using this regex:
[^,]+
RegEx Demo
Alternatively you can use this regex as well for preg_match:
^.+?\.(png|jpe?g)
Should work with this, deletes the comma from the URL:
if (preg_match_all("/(.*?),/is", $img_string, $matches)) {
$url = $matches[1][0];
echo $url;
}
Edit: tested here: http://ideone.com/pq1WzA
removing everything after the first comma like this :
$result = preg_replace('#[,].*$#ui', '', $img_string);

How to get last digits which are number before '.html' string

there is a string, for example : http://address.com/sef-title-of-topic-1111.html
i could not get 1111 in anyway with regexp in php. Is it possible? How?
my code:
$address = 'http://address.com/sef-title-of-topic-1111.html';
preg_match('#-(.*?)\.html#sim',$address,$result);
If the url example is how they will always appear (ie. ending in hyphen, numbers, .html) then this should work:
$str = "http://address.com/sef-title-of-topic-1111.html";
preg_match('#.*-(\d+)\.html#', $str, $matches);
print_r($matches);
If they won't always match the pattern you gave in your question, then clarify by showing alternative values for your $address value.
If you know that the extension is definitely .html (and not .htm for example) then you could use
$lastNos= substr($input, -9, -4);
Clearly a simple solution but you have not specified why regex is required.
If the URL will always be in this format I would use str_replace to strip the .html then explode by "-" and find the last piece.
Of course all of that is assuming the URL is always in this format.
If the format is always the same you dont need a regex.
$url = "http://address.com/sef-title-of-topic-1111.html";
echo $str = strrev(array_shift(array_reverse(explode(".", array_shift(explode("-",strrev($url)))))));
edit: sorry my php is a bit rusty

Convert absolute to relative url with preg_replace

(I searched, and found lots of questions about converting relative to absolute urls, but nothing for absolute to relative.)
I'd like to take input from a form field and end up with a relative url. Ideally, this would be able to handle any of the following inputs and end up with /page-slug.
http://example.com/page-slug
http://www.example.com/page-slug
https://example.com/page-slug
https://www.example.com/page-slug
example.com/page-slug
/page-slug
And maybe more I'm not thinking of...?
Edit: I'd also like this to work for something where the relative url is e.g. /page/post (i.e. something with more than one slash).
Take a look at parse_url if you are always working with URLs. Specifically:
parse_url($url, PHP_URL_PATH)
FYI, I tested it against all your input, and it worked on all except: example.com/page-slug
Try this regexp.
#^ The start of the string
(
:// Match either ://
| Or
[^/] Not a /
)* Any number of times
#
And replace it with the empty string.
$pattern = '#^(://|[^/])+#';
$replacement = '';
echo preg_replace($pattern, $replacement, $string);
I think you want the part of the URL after the hostname, you can use parse_url:
$path = parse_url($url, PHP_URL_PATH);
Note that this gets the whole of the URL after the hostname, so http://example.com/page/slug will give /page/slug.
I would just do this a little hacky way if you know your application. I would use a regex to search for
[a-z].([(com|org|net)])

Extracting URLs from a JSON-like string

I need to extract the first URL from some content. The content may be like this:
({items:[{url:"http://cincinnati.ebayclassifieds.com/",name:"Cincinnati"},{url:"http://dayton.ebayclassifieds.com/",name:"Dayton"}],error:null});
or may contain only a link
({items:[{url:"http://portlandor.ebayclassifieds.com/",name:"Portland (OR)"}],error:null});
currently I have :
$pattern = "/\:\[\{url\:\"(.*)\"\,name/";
preg_match_all($pattern, $htmlContent, $matches);
$URL = $matches[1][0];
however it works only if there is a single link so I need a regex which should work for the both cases.
You can use this REGEX:
$pattern = "/url\:\"([^\"]+)\"/";
Worked for me :)
Hopefully this should work for you
<?php
$str = '({items:[{url:"http://cincinnati.ebayclassifieds.com/",name:"Cincinnati"},{url:"http://dayton.ebayclassifieds.com/",name:"Dayton"}],error:null});'; //The string you want to extract the 1st URL from
$match = ""; //Define the match variable
preg_match("%(((ht|f)tp(s?))\://)?(www.|[a-zA-Z].)[a-zA-Z0-9\-\.]+\.(com|edu|gov|mil|net|org|biz|info|name|museum|us|ca|uk)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\;\?\'\\\+&\%\$#\=~_\-]+))*%",$str,$match); //I Googled for the best Regular expression for URLs and found the one included in the preg_match
echo $match[0]; //Return the first item in the array (the first URL returned)
?>
This is the website that I found the regular expression on: http://regexlib.com/Search.aspx?k=URL
like the others have said, json_decode should work for you aswell
That smells like JSON to me. Try using http://php.net/json_decode
Looks like JSON to me, visit http://php.net/manual/en/book.json.php and use json_decode().

PHP if string contains URL isolate it

In PHP, I need to be able to figure out if a string contains a URL. If there is a URL, I need to isolate it as another separate string.
For example: "SESAC showin the Love! http://twitpic.com/1uk7fi"
I need to be able to isolate the URL in that string into a new string. At the same time the URL needs to be kept intact in the original string. Follow?
I know this is probably really simple but it's killing me.
Something like
preg_match('/[a-zA-Z]+:\/\/[0-9a-zA-Z;.\/?:#=_#&%~,+$]+/', $string, $matches);
$matches[0] will hold the result.
(Note: this regex is certainly not RFC compliant; it may fetch malformed (per the spec) URLs. See http://www.faqs.org/rfcs/rfc1738.html).
this doesn't account for dashes -. needed to add -
preg_match('/[a-zA-Z]+:\/\/[0-9a-zA-Z;.\/\-?:#=_#&%~,+$]+/', $_POST['string'], $matches);
URLs can't contain spaces, so...
\b(?:https?|ftp)://\S+
Should match any URL-like thing in a string.
The above is the pure regex. PHP preg_* and string escaping rules apply before you can use it.
$test = "SESAC showin the Love! http://twitpic.com/1uk7fi";
$myURL= strstr ($test, "http");
echo $myURL; // prints http://twitpic.com/1uk7fi

Categories