PHP preg_match between text and the first occurrence of - - php

I'm trying to grab the 12345 out of the following URL using preg_match.
$url = "http://www.somesite.com/directory/12345-this-is-the-rest-of-the-url.html";
$beg = "http://www.somesite.com/directory/";
$close = "\-";
preg_match("($beg(.*)$close)", $url, $matches);
I have tried multiple combinations of . * ? \b
Does anyone know how to extract 12345 out of the URL with preg_match?

Two things, first off, you need preg_quote and you also need delimiters. Using your construction method:
$url = "http://www.somesite.com/directory/12345-this-is-the-rest-of-the-url.html";
$beg = preg_quote("http://www.somesite.com/directory/", '/');
$close = preg_quote("-", '/');
preg_match("/($beg(.*?)$close)/", $url, $matches);
But, I would write the query slightly differently:
preg_match('/directory\/(\d+)-/i', $url, $match);
It only matches the directory part, is far more readable, and ensures that you only get digits back (no strings)

This doesn't use preg_match but would achieve the same thing and would execute faster:
$url = "http://www.somesite.com/directory/12345-this-is-the-rest-of-the-url.html";
$url_segments = explode("/", $url);
$last_segment = array_pop($url_segments);
list($id) = explode("-", $last_segment);
echo $id; // Prints 12345

Too slow, I am ^^.
Well, if you are not stuck on preg_match, here is a fast and readable alternative:
$num = (int)substr($url, strlen($beg));
(looking at your code I guessed, that the number you are looking for is a numeric id is it is typical for urls looking like that and will not be "12abc" or anything else.)

Related

Remove characters from beginning and end string

I want to ouput only MYID from URL. What I did so far:
$url = "https://whatever.expamle.com/display/MYID?out=1234567890?Browser=0?OS=1";
echo substr($url, 0, strpos($url, "?out="));
output: https://whatever.expamle.com/display/MYID
$url = preg_replace('#^https?://whatever.expamle.com/display/#', '', $url);
echo $url;
ouput: MYID?out=1234567890?Browser=0?OS=1
How can I combine this? Thanks.
For a more general solution, we can use regex with preg_match_all:
$url = "https://whatever.expamle.com/display/MYID?out=1234567890?Browser=0?OS=1";
preg_match_all("/\/([^\/]+?)\?/", $url, $matches);
print_r($matches[1][0]); // MYID
When the string is always a Uniform Resource Locator (URL), like you present it in your question,
given the following string:
$url = "https://whatever.expamle.com/display/MYID?out=1234567890?Browser=0?OS=1";
you can benefit from parsing it first:
$parts = parse_url($url);
and then making use of the fact that MYID is the last path component:
$str = preg_replace(
'~^.*/(?=[^/]*$)~' /* everything but the last path component */,
'',
$parts['path']
);
echo $str, "\n"; # MYID
and then depending on your needs, you can combine with any of the other parts, for example just the last path component with the query string:
echo "$str?$parts[query]", "\n"; # MYID?out=1234567890?Browser=0?OS=1
Point in case is: If the string already represents structured data, use a dedicated parser to divide it (cut it in smaller pieces). It is then easier to come to the results you're looking for.
If you're on Linux/Unix, it is even more easy and works without a regular expression as the basename() function returns the paths' last component then (does not work on Windows):
echo basename(parse_url($url, PHP_URL_PATH)),
'?',
parse_url($url, PHP_URL_QUERY),
"\n"
;
https://php.net/parse_url
https://php.net/preg_replace
https://www.php.net/manual/en/regexp.reference.assertions.php

Match unique string in PHP between common characters

I'm trying to do some string matching in PHP. I have the following url string in a variable:
phones/gift.nintendo-3ds/handset.blackberry-9790.html
I want remove the /gift.nintendo-3ds from the above, but the gift will always be different.
Any ideas? I want the url variable to look like this after each call different gifts:
phones/handset.blackberry-9790.html
Thanks
preg_replace('/\/gift\.[^/]*/', '', $url);
Matches /gift. then anything till the next slash and replaces it with blank.
Try with:
$input = 'phones/gift.nintendo-3ds/handset.blackberry-9790.html';
$output = preg_replace('(gift\.[^/]*\/)', '', $input);
You could split it apart, remove the second part you do not want to keep and then concat it again:
$parts = explode('/', $url, 3);
unset($parts[1]);
$result = implode('/', $parts);
This is not using any regular expression as you might have thought about but probably tells you about some other useful functions.
Demo: http://codepad.org/a1pNW8J6
A regex variant could be:
echo preg_replace('~^([^/]+)(/[^/]+)~', '$1', $url);
Demo: http://codepad.org/vyR04xMn

Very simple regex to get the numbers in url: http://artige.no/bilde/6908

This is just a really simple regex-question. I'd like to grab the last numbers from a url that looks like this:
http://artige.no/bilde/6908
The url will always look like this, with a number after /bilde/
You do not really need a regex for this.
$url = 'http://artige.no/bilde/6908';
$num = intval(substr($url, strrpos($url, '/') + 1));
echo $num;
In which case,
~/bilde/(\d+)~
Is what you seek. Will find any number of digits after the string /bilde/
$matches = array();
$url = "http://artige.no/bilde/6908";
preg_match('/([0-9]+)$/', $url, $matches);
print_r($matches);
This uses the pattern ([0-9]+)$ to match one or more digits at the end of the string.
"(?<=/bilde/)\d+$"
test with grep:
kent$ echo "http://artige.no/bilde/6908"|grep -Po "(?<=/bilde/)\d+$"
6908
as you required, (/bilde/) url like:
http://artige.no/NoTWithBilde/1234
will not be matched.

Preg_replace domain problem

I'm Stuck try to get domain using preg_replace,
i have some list url
download.adwarebot.com/setup.exe
athena.vistapages.com/suspended.page/
prosearchs.com/se/tds/in.cgi?4&group=5&parameter=mail
freeserials.spb.ru/key/68703.htm
what i want is
adwarebot.com
vistapages.com
prosearchs.com
spb.ru
any body can help me with preg_replace ?
i'm using this http://gskinner.com/RegExr/ for testing :)
using preg_replace, if the number of TLDs is limited:
$urls = array( 'download.adwarebot.com/setup.exe',
'athena.vistapages.com/suspended.page/',
'prosearchs.com/se/tds/in.cgi?4&group=5&parameter=mail',
'freeserials.spb.ru/key/68703.htm' );
$domains = preg_replace('|([^.]*\.(?:com|ru))/', '$1', $urls);
matches everything that comes before .com or .ru which is not a period. (to not match subdomains)
You could however use PHPs builtin parse_url function to get the host (including subdomain) – use another regex, substr or array manipulation to get rid of it:
$host = parse_url('http://download.adwarebot.com/setup.exe', PHP_URL_HOST);
if(count($parts = explode('.', $host)) > 2)
$host = implode('.', array_slice($parts, -2));
Following code assumes that every entry is exactly at the beginning of the string:
preg_match_all('#^([\w]*\.)?([\w]*\.[\w]*)/#', $list, $m);
// var_dump($m[2]);
P.S. But the correct answer is still parse_url.
Why use a regular expression? Of course it is possible, but using this:
foreach($url in $url_list){
$url_parts = explode('/', $url);
$domains[] = preg_replace('~(^[^\.]+\.)~i','',$url_parts[0]);
}
$domains = array_unique($domains);
will do just fine;
maybe a more generic solution:
tested by grep, I don't have php environment, sorry:
kent$ echo "download.adwarebot.com/setup.exe
dquote> athena.vistapages.com/suspended.page/
dquote> prosearchs.com/se/tds/in.cgi?4&group=5&parameter=mail
dquote> freeserials.spb.ru/key/68703.htm"|grep -Po '(?<!/)([^\./]+\.[^\./]+)(?=/.+)'
output:
adwarebot.com
vistapages.com
prosearchs.com
spb.ru

Function to remove GET variable with php

i have this URI.
http://localhost/index.php?properties&status=av&page=1
i am fetching basename of the URI using following code.
$basename = basename($_SERVER['REQUEST_URI']);
the above code gives me following string.
index.php?properties&status=av&page=1
i would want to remove the last variable from the string i.e &page=1. please note the value for page will not always be 1. keeping this in mind i would want to trim the variable this way.
Trim from the last position of the string till the first delimiter i.e &
Update :
I would like to remove &page=1 from the string, no matter in which position it is on.
how do i do this?
Instead of hacking around with regular expression you should parse the string as an url (what it is)
$string = 'index.php?properties&status=av&page=1';
$parts = parse_url($string);
$queryParams = array();
parse_str($parts['query'], $queryParams);
Now just remove the parameter
unset($queryParams['page']);
and rebuild the url
$queryString = http_build_query($queryParams);
$url = $parts['path'] . '?' . $queryString;
There are many roads that lead to Rome. I'd do it with a RegEx:
$myString = 'index.php?properties&status=av&page=1';
$myNewString = preg_replace("/\&[a-z0-9]+=[0-9]+$/i","",$myString);
if you only want the &page=1-type parameters, the last line would be
$myNewString = preg_replace("/\&page=[0-9]+/i","",$myString);
if you also want to get rid of the possibility that page is the only or first parameter:
$myNewString = preg_replace("/[\&]*page=[0-9]+/i","",$myString);
Thank you guys but i think i have found the better solution, #KingCrunch had suggested a solution i extended and converted it into function. the below function can possibly remove or unset any URI variable without any regex hacks being used. i am posting it as it might help someone.
function unset_uri_var($variable, $uri) {
$parseUri = parse_url($uri);
$arrayUri = array();
parse_str($parseUri['query'], $arrayUri);
unset($arrayUri[$variable]);
$newUri = http_build_query($arrayUri);
$newUri = $parseUri['path'].'?'.$newUri;
return $newUri;
}
now consider the following uri
index.php?properties&status=av&page=1
//To remove properties variable
$url = unset_uri_var('properties', basename($_SERVER['REQUEST_URI']));
//Outputs index.php?page=1&status=av
//To remove page variable
$url = unset_uri_var('page', basename($_SERVER['REQUEST_URI']));
//Outputs index.php?properties=&status=av
hope this helps someone. and thank you #KingKrunch for your solution :)
$pos = strrpos($_SERVER['REQUEST_URI'], '&');
$url = substr($_SERVER['REQUEST_URI'], 0, $pos - 1);
Documentation for strrpos.
Regex that works on every possible situation: /(&|(?<=\?))page=.*?(?=&|$)/. Here's example code:
$regex = '/(&|(?<=\?))page=.*?(?=&|$)/';
$urls = array(
'index.php?properties&status=av&page=1',
'index.php?properties&page=1&status=av',
'index.php?page=1',
);
foreach($urls as $url) {
echo preg_replace($regex, '', $url), "\n";
}
Output:
index.php?properties&status=av
index.php?properties&status=av
index.php?
Regex explanation:
(&|(?<=\?)) -- either match a & or a ?, but if it's a ?, don't put it in the match and just ignore it (you don't want urls like index.php&status=av)
page=.*? -- matches page=[...]
(?=&|$) -- look for a & or the end of the string ($), but don't include them for the replacement (this group helps the previous one find out exactly where to stop matching)
You could use a RegEx (as Chris suggests) but it's not the most efficient solution (lots of overhead using that engine... it's easy to do with some string parsing:
<?php
//$url="http://localhost/index.php?properties&status=av&page=1";
$base=basename($_SERVER['REQUEST_URI']);
echo "Basename yields: $base<br />";
//Find the last ampersand
$lastAmp=strrpos($base,"&");
//Filter, catch no ampersands found
$removeLast=($lastAmp===false?$base:substr($base,0,$lastAmp));
echo "Without Last Parameter: $removeLast<br />";
?>
The trick is, can you guarantee that $page will be stuck on the end? If it is - great, if it isn't... what you asked for may not always solve the problem.

Categories