regex for extracting anchor in a url

regex for extracting anchor in a url - php

I'm trying to extract an anchor tag and everything behind it from a url using preg_replace. I found one to remove everything after the #, but I want one that removes the # and everything behind it.
http://blah.com#removethis
Thanks,
Steve

You can try the parse_url function:
$url = "http://blah.com#removethis";
print_r(parse_url($url));
fragment - after the hashmark #
Output:
Array
(
[scheme] => http
[host] => blah.com
[fragment] => removethis
)

Another way without regex:
$newurl = substr($url, 0, strpos($url,"#"));

$url = preg_replace('/#.*$/', '', $url);

$url = preg_replace('##.*$#', '', $url);

Don't use regexes when there are proper library functions to do the job.

Related

How do I remove http, https from string in php

Example user input
http://domain.com/
hTTp://domain.com/Cars/
hTtp://www.domain.com/pAge/
I want a php function to make the output like
domain.com
domain.com/Cars/
www.domain.com/pAge/
Let me know :)

You don't need regular expressions here, just use parse_url and str_replace:
$url = 'hTtp://www.domain.com/pAge/';
$url = str_replace( parse_url( $url, PHP_URL_SCHEME ) . '://', '', $url );

Consider using parse_url() to get an array with the different parts of the url and rebuild it as a string any way you want.

Consider using a regex, with preg_replace
$converted = preg_replace('#^h+t+p+s+?://#i', '', $stringtoprocess);

Maybe the easiest way might be
echo str_replace('//','',strstr($url, '//'));

Get part of URL with PHP

I have a site and I would need to get one pages URL with PHP. The URL might be something www.mydomain.com/thestringineed/ or it can www.mydomain.com/thestringineed?data=1 or it can be www.mydomain.com/ss/thestringineed
So it's always the last string but I dont want to get anything after ?

parse_url should help you out.
<?php
$url = "http://www.mydomain.com/thestringineed/";
$parts = parse_url($url);
print_r($parts);
?>

You will use the parse_url function, then look at the path portion of the return.
like this:
$url='www.mydomain.com/thestringineed?data=1';
$components=parse_url($url);
//$mystring= end(explode('/',$components['path']));
// I realized after this answer had sat here for about 3 years that there was
//a mistake in the above line
// It would only give the last directory, so if there were extra directories in the path, it would fail. Here's the solution:
$mystring=str_replace( reset(explode('/',$components['path'])),'',$components['path']); //This is to remove the domain from the beginning of the path.
// In my testing, I found that if the scheme (http://, https://, ...) is present, the path does not include
//the domain. (it's available on it's own as ['host']) In that case it's just
// $mystring=$components['path']);

parse_url() is the function you are looking for. The exact part you want, can be received through PHP_URL_PATH
$url = 'http://php.net/manual/en/function.parse-url.php';
echo parse_url($url, PHP_URL_PATH);

use $_SERVER['REQUEST_URI'] it will return full current page url you can split it with '/' and use the last array index . it will be the last string

You can use:
$strings = explode("/", $urlstring);
Which will remove all the '/' in the url and return an array containing all the words.
$strings[count($strings)-1]
Now has the value of the string you need, but it may contain '?data=1' so we need to remove that:
$strings2 = explode("?", $strings[count($strings)-1]);
$strings2[0]
Has the string you are wanting out of the url.
Hope this Helps!

<?php
$url = 'http://username:password#hostname/path?arg=value#anchor';
print_r(parse_url($url));
echo parse_url($url, PHP_URL_PATH);
?>
and your out put is
Array
(
[scheme] => http
[host] => hostname
[user] => username
[pass] => password
[path] => /path
[query] => arg=value
[fragment] => anchor
)
/path

Strip the last part off a url [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
extract last part of a url
I have a url like this:
http://www.domain.co.uk/product/SportingGoods/Cookware/1/B000YEU9NA/Coleman-Family-Cookset
I want extract just the product name off the end "Coleman-Family-Cookset"
When I use parse_url and print_r I end up with the following:
Array (
[scheme] => http
[host] => www.domain.co.uk
[path] => /product/SportingGoods/Cookware/1/B000YEU9NA/Coleman-Family-Cookset
)
How do I then trim "Coleman-Family-Cookset" off the end?
Thanks in advance

All the answers above works but all use unnecessary arrays and regular expressions, you need a position of last / which you can get with strrpos() and than you can extract string with substr():
substr( $url, 0, strrpos( $url, '/'));
You'll maybe have to add +/- 1 after strrpos()
This is much more effective solution than using preg_* or explode, all work though.

$url = 'http://www.domain.co.uk/product/SportingGoods/Cookware/1/B000YEU9NA/Coleman-Family-Cookset';
$url = explode('/', $url);
$last = array_pop($url);
echo $last;

You have the path variable(from the array as shown above).
Use the following:
$tobestripped=$<the array name>['path']; //<<-the entire path that is
$exploded=explode("/", $tobestripped);
$lastpart=array_pop($exploded);
Hope this helps.

$url = rtrim($url, '/');
preg_match('/([^\/]*)$/', $url, $match);
var_dump($match);
Test

Preg_replace domain problem

I'm Stuck try to get domain using preg_replace,
i have some list url
download.adwarebot.com/setup.exe
athena.vistapages.com/suspended.page/
prosearchs.com/se/tds/in.cgi?4&group=5&parameter=mail
freeserials.spb.ru/key/68703.htm
what i want is
adwarebot.com
vistapages.com
prosearchs.com
spb.ru
any body can help me with preg_replace ?
i'm using this http://gskinner.com/RegExr/ for testing :)

using preg_replace, if the number of TLDs is limited:
$urls = array( 'download.adwarebot.com/setup.exe',
'athena.vistapages.com/suspended.page/',
'prosearchs.com/se/tds/in.cgi?4&group=5&parameter=mail',
'freeserials.spb.ru/key/68703.htm' );
$domains = preg_replace('|([^.]*\.(?:com|ru))/', '$1', $urls);
matches everything that comes before .com or .ru which is not a period. (to not match subdomains)
You could however use PHPs builtin parse_url function to get the host (including subdomain) – use another regex, substr or array manipulation to get rid of it:
$host = parse_url('http://download.adwarebot.com/setup.exe', PHP_URL_HOST);
if(count($parts = explode('.', $host)) > 2)
$host = implode('.', array_slice($parts, -2));

Following code assumes that every entry is exactly at the beginning of the string:
preg_match_all('#^([\w]*\.)?([\w]*\.[\w]*)/#', $list, $m);
// var_dump($m[2]);
P.S. But the correct answer is still parse_url.

Why use a regular expression? Of course it is possible, but using this:
foreach($url in $url_list){
$url_parts = explode('/', $url);
$domains[] = preg_replace('~(^[^\.]+\.)~i','',$url_parts[0]);
}
$domains = array_unique($domains);
will do just fine;

maybe a more generic solution:
tested by grep, I don't have php environment, sorry:
kent$ echo "download.adwarebot.com/setup.exe
dquote> athena.vistapages.com/suspended.page/
dquote> prosearchs.com/se/tds/in.cgi?4&group=5&parameter=mail
dquote> freeserials.spb.ru/key/68703.htm"|grep -Po '(?<!/)([^\./]+\.[^\./]+)(?=/.+)'
output:
adwarebot.com
vistapages.com
prosearchs.com
spb.ru

php getting substring from a string

I have the next URL: http://domen.com/aaa/bbb/ccc.
How can I get the string after http://domen.com/?
Thanks a lot.

$sub = substr($string, 0, 10);
But if you actually want to parse the URL (that is, you want it to work with all URLs), use parse_url. For "http://domen.com/aaa/bbb/ccc", it would give you an array like this:
Array
(
[scheme] => http
[host] => domen.com
[user] =>
[pass] =>
[path] => /aaa/bbb/ccc
[query] =>
[fragment] =>
)
You could then compile this into the original url (to get http://domen.com/):
$output = $url['scheme'] . "://" . $url['host'] . $url['path'];
assuming $url contains the parse_url results.

You can use PHP's split.
Your code will be something like:
$s = "http://domen.com/aaa/bbb/ccc";
$vals = split("http://domen.com/", $s);
// $v will contain aaa/bbb/ccc
$v = $vals[1];

parse_url()

http://php.net/manual/en/function.parse-url.php Might be the way to go.

If you simply want the string and the "http://domen.com/" part is fixed:
$url = 'http://domen.com/aaa/bbb/ccc';
$str = str_replace('http://domen.com/','',$url);

Use the regex for example like the function preg_replace

Try this:
preg_replace('/^.*?\w\//', '', $url)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

regex for extracting anchor in a url - php

I'm trying to extract an anchor tag and everything behind it from a url using preg_replace. I found one to remove everything after the #, but I want one that removes the # and everything behind it. http://blah.com#removethis Thanks, Steve

You can try the parse_url function: $url = "http://blah.com#removethis"; print_r(parse_url($url)); fragment - after the hashmark # Output: Array ( [scheme] => http [host] => blah.com [fragment] => removethis )

Another way without regex: $newurl = substr($url, 0, strpos($url,"#"));

$url = preg_replace('/#.*$/', '', $url);

$url = preg_replace('##.*$#', '', $url);

Don't use regexes when there are proper library functions to do the job.

Related

How do I remove http, https from string in php

Get part of URL with PHP

Strip the last part off a url [duplicate]

Preg_replace domain problem

php getting substring from a string

Categories

Resources