return specific part of url preg_match - php

Imagine you have this cases
$d='http://www.example.com/';
$d1='http://example.com/';
$d2='http://www.example.com';
$d3='www.example.com/';
$d4='http://www.example.com/';
$d5='http://www.example.com/blabla/blabla.php';
I need to get only example.com and nothing else.
I've tried using parse_url to no avail.
Using parse_url($d1, PHP_URL_HOST); returns nothing in $d3, for example.
Can any of you provide a ereg to match this?
Thank you very much in advance!

There is no path_url function, but you can use the parse_url function to get the host (domain name) out of a URL string:
if(!preg_match('#^https?://#', $str))
{
$domain = 'http://' . $domain;
}
$domain = parse_url($str, PHP_URL_HOST);

Related

PHP regex: How to remove ?file in url?

My url like this:
http://mywebsite.com/movies/937-lan-kwai-fong-2?file=Rae-Ingram&q=
http://mywebsite.com/movies/937-big-daddy?file=something&q=
I want to get "lan-kwai-fong-2" and "big-daddy", so I use this code but it doesn't work. Please help me fix it ! If you can shorten it, it is so great !
$url= $_SERVER['REQUEST_URI'];
preg_replace('/\?file.*/','',$url);
preg_match('/[a-z][\w\-]+$/',$url,$matches);
$matches= str_replace("-"," ",$matches[0]);
First there are issue with your code which im going to go over because they are general things:
preg_replace does not work by reference so you are never actually modifying the url. You need to assign the result of the replace to a variable:
// this would ovewrite the current value of url with the replaced value
$url = preg_replace('/\?file.*/','',$url);
It is possible that preg_match will not find anything so you need to test the result
// it should also be noted that sometimes you may need a more exact test here
// because it can return false (if theres an error) or 0 (if there is no match)
if (preg_match('/[a-z][\w\-]+$/',$url,$matches)) {
// do stuff
}
Now with that out of the way you are making this more difficult than it needs to be. There are specific function for working with urls parse_url and parse_str.
You can use these to easily work with the information:
$urlInfo = parse_url($_SERVER['REQUEST_URI']);
$movie = basename($urlInfo['path']); // yields 937-the-movie-title
Just replace
preg_replace('/\?file.*/','',$url);
with
$url= preg_replace('/\?file.*/','',$url);
Regex works, and parse_url is the right way to do it. But for something quick and dirty I would usually use explode. I think it's clearer.
#list($path, $query) = explode("?", $url, 2); // separate path from query
$match = array_pop(explode("/", $path)); // get last part of path
How about this:
$url = $_SERVER['REQUEST_URI'];
preg_match('/\/[^-]+-([^?]+)\?/', $url, $matches);
$str = isset($matches[1]) ? $matches[1] : false;`
match last '/'
match anything besides '-' until '-'
capture anything besides '?' until (not including) '?'

Get vine video id using php

I need to get the vine video id from the url
so the output from link like this
https://vine.co/v/bXidIgMnIPJ
be like this
bXidIgMnIPJ
I tried to use code form other question here for Vimeo (NOT VINE)
Get img thumbnails from Vimeo?
This what I tried to use but I did not succeed
$url = 'https://vine.co/v/bXidIgMnIPJ';
preg_replace('~^https://(?:www\.)?vine\.co/(?:clip:)?(\d+)~','$1',$url)
basename maybe?
<?php
$url = 'https://vine.co/v/bXidIgMnIPJ';
var_dump(basename($url));
http://codepad.org/vZiFP27y
Assuming it will always be in that format, you can just split the url by the / delimiter. Regex is not needed for a simple url such as this.
$id = end(explode('/', $url));
Referring to as the question is asked here is a solution for preg_replace:
$s = 'https://vine.co/v/bXidIgMnIPJ';
$new_s = preg_replace('/^.*\//','',$s);
echo $new_s;
// => bXidIgMnIPJ
or if you need to validate that an input string is indeed a link to vine.co :
$new_s = preg_replace('/^(https?:\/\/)?(www\.)?vine\.co.*\//','',$s);
I don't know if that /v/ part is always present or is it always v... if it is then it may also be added to regex for stricter validation:
$new_s = preg_replace('/^(https?:\/\/)?(www\.)?vine\.co\/v\//','',$s);
Here's what I am using:
function getVineId($url) {
preg_match("#(?<=vine.co/v/)[0-9A-Za-z]+#", $url, $matches);
if (isset($matches[0])) {
return $matches[0];
}
return false;
}
I used a look-behind to ensure "vine.co/v/" always precedes the ID, while ignoring if the url is HTTP or HTTPS (or if it lacks a protocol altogether). It assumes the ID is alphanumeric, of any length. It will ignore any characters or parameters after the id (like Google campaign tracking parameters, etc).
I used the "#" delimiter so I wouldn't have to escape the forward slashes (/), for a cleaner look.
explode the string with '/' and the last string is what you are looking for :) Code:
$vars = explode("/",$url);
echo $vars[count($vars)-1];
$url = 'https://vine.co/v/b2PFre2auF5';
$regex = '/^http(?:s?):\/\/(?:www\.)?vine\.co\/v\/([a-zA-Z0-9]{1,13})$/';
preg_match($regex,$url,$m);
print_r($m);
1. b2PFre2auF5

PHP : Cropping a selected text from a string

I want to get the string part of a URL cropping off http:// like from http://google.com I need to crop http:// and get google.com.
I used the following code an it gives me /google.com
strrchr("http://google.com" , "//");
how can I do this? ow can I get only google.com
*Update: * Google.com is just an example, the url can be a long url like http://artile.blogspot.com/article.htm so i need article.blogspot.com/article.htm
The function parse_url() is what you're looking for.
As Lawrence says, the exact code will be:
$host = parse_url('http://google.com',PHP_URL_HOST);
An alternative would be str_replace()
$host = str_replace("http://", "", "http://google.com", 1);
The fourth parameter(count) makes sure that it'll only replace the first instance of http://
Why not just look to see if the string starts with http:// and then use a function to get the remaining sub-string?
$url = 'http://google.com';
if (strpos($url, 'http://') === 0) {
$url = substr($url, 7);
}
If there are other prefixes that you would like to remove, then perhaps it might be time to start looking into a quick regex to get the job done. For example:
$url = 'https://google.com';
$url = preg_replace('#^(?:https?|ftps?|news|feed|gopher)://#', '', $url);
you could also use [^http://]+$

Remove urls that have certain domain in them

I have a set of urls for example
http://t3.gstatic.com/images?q=tbn:ANd9GcRfLZhH0jpyUJxGtsiHcldUPiNQsosLdR9xgcYqVWyRWGYS4qtt
http://feeds.feedburner.com/~r/DrudgeReportFeed/~4/zSLWG4ybmjw
I want to remove any url that has feeds.feedburner.com in it. What regular expression would I use? (php)
Why use regex? Use parse_url.
$urlData = parse_url($url);
if ($urlData['host'] != 'feeds.feedburner.com'){
// Not a feedburner url
}
Shorthand, by the way, is as follows:
if (parse_url($url, PHP_URL_HOST) != 'feeds.feedburner.com'){
// same outcome
}
Use this regexp:
/feeds\.feedburner\.com/

Getting domain name without TLD

I have this code right here:
// get host name from URL
preg_match('#^(?:http://)?([^/]+)#i',
"http://www.joomla.subdomain.php.net/index.html", $matches);
$host = $matches[1];
// get last two segments of host name
preg_match('/[^.]+\.[^.]+$/', $host, $matches);
echo "domain name is: {$matches[0]}\n";
The output will be php.net
I need just php without .net
Although regexes are fine here, I'd recommend parse_url
$host = parse_url('http://www.joomla.subdomain.php.net/index.html', PHP_URL_HOST);
$domains = explode('.', $host);
echo $domains[count($domains)-2];
This will work for TLD's like .com, .org, .net, etc. but not for .co.uk or .com.mx. You'd need some more logic (most likely an array of tld's) to parse those out .
Group the first part of your 2nd regex into /([^.]+)\.[^.]+$/ and $matches[1] will be php
Late answer and it doesn't work with subdomains, but it does work with any tld (co.uk, com.de, etc):
$domain = "somesite.co.uk";
$domain_solo = explode(".", $domain)[0];
print($domain_solo);
Demo
It's really easy:
function get_tld($domain) {
$domain=str_replace("http://","",$domain); //remove http://
$domain=str_replace("www","",$domain); //remowe www
$nd=explode(".",$domain);
$domain_name=$nd[0];
$tld=str_replace($domain_name.".","",$domain);
return $tld;
}
To get the domain name, simply return $domain_name, it works only with top level domain. In the case of subdomains you will get the subdomain name.

Categories