PHP Regex Pattern - Match url if only one level deep - php

My question is similar to this but I can't get it to work: Path Regular Expression - Allow only one level
I have an array with a bunch or urls from a website that are either a category or sub-category page so:
http://www.mysite.com/dogs/
http://www.mysite.com/cats/
http://www.mysite.com/food/
are category pages (only level beyond the root domain)
Sub-category pages look like:
http://www.mysite.com/dogs/poodles/
http://www.mysite.com/cats/siamese/
http://www.mysite.com/food/pizza/
I want to strip out the sub-categories and only be left with category pages in the array. Any url that contains anything beyond the first set of / / after the root url should be filtered out.
I think I need to use preg_grep but using the pattern in the updated answer that I referenced above like
$regex = "#^/[^/]+/?$#";
$categories_only = preg_grep($regex,$array);
yields an empty array.
What pattern will match this correctly?

So I think you don't need regex for this task.
You could implement a function to filter the array:
$urls = array('http://www.mysite.com/dogs/',
'http://www.mysite.com/cats/siamese/junk/?trash=1&x=y',
'http://www.mysite.com/food/pizza/');
function filter_url($url) {
$split = explode('/', $url);
return (count($split) == 5 && empty($split[4])) ||
(count($split) == 4 && !empty($split[3]));
}
print_r(array_filter($urls, 'filter_url'));
This would output:
Array ( [0] => http://www.mysite.com/dogs/ )

This outputs:
Array
(
[2] => http://www.mysite.com/dogs/
[3] => http://www.mysite.com/cats/
[4] => http://www.mysite.com/food/
)
<?php
$array = array("http://www.mysite.com/dogs/poodles/",
"http://www.mysite.com/cats/siamese/",
"http://www.mysite.com/dogs/",
"http://www.mysite.com/cats/",
"http://www.mysite.com/food/",
"http://www.mysite.com/food/pizza/");
$regex = "#^http://[^/]+/?[^/]+/?$#";
$categories_only = preg_grep($regex,$array);
print_r($categories_only);

I think this works:
^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})\/([\da-z\.-]+)\/
It only allows for two forward slashes after the .com or whatever.
play here... http://rubular.com/r/TBLpnJFdJg

Related

php to get 2 string inside href tag with preg_match

I need a code to get this strings from inside a href tag
Example of weburl:
/video/funny-videos-with-dogs-21608674/
The strings what i need is:
Url Title = funny-videos-with-dogs
Url ID = 21608674
How i can get this 2 strings from the url via preg_match?
Update:
What i try so far is:
preg_match('/\/video\/(.*?)-/is', $vUrl, $vUrl_Title);
but is show me just "funny" ... i need something to can show "funny-videos-with-dogs"
This should be an easy one:
<?php
if (preg_match("#/video/([a-z\-]+)-([0-9]+)/#", "/video/funny-videos-with-dogs-21608674/", $matches)) {
print_r($matches);
$urlTitle = $matches[1];
$urlID = $matches[2];
}
else {
print_r("Not found!");
}
And this yells
Array
(
[0] => /video/funny-videos-with-dogs-21608674/
[1] => funny-videos-with-dogs
[2] => 21608674
)
Taking into account that non other chars should be matched. I'm pretty sure you can simplify the regexp, I don't have a deep regexp knowledge but this should work

Get slug from current url

I have URLs like so:
http://localhost/hi-every-body/
http://s1.localhost/hello-world/
http://s2.localhost/bye-world/
I want the page "slug" from the URLS, eg.
hi-every-body
hello-world
bye-world
What's a simple way of doing this in PHP?
This should do exactly that:
trim(parse_url($url, PHP_URL_PATH), '/');
It takes the path and strips the forward slashes on both sides.
To get only the last part of the path:
basename(parse_url($url, PHP_URL_PATH));
A possibly more robust solution is this:
$slugs = explode("/", $_GET['params']);
This will give you an array filled with every element in your URL.
Eg. http://localhost/one/hippo/cake?t=21
Becomes the array:
Array (
[0] => one
[1] => hippo
[2] => cake
)
This allows you to use each element as you require.

How do I stop a Regular Expression when I reach a certain character in PHP?

I have a string, /controller/method/parameter1/parameter2?parameter1=parameter2. This is just the REQUEST_URI I am using for my website.
I want to split this string into separate array elements using PHP, and the following code works fine for this action: preg_split('[/]', $_SERVER['REQUEST_URI'], NULL, PREG_SPLIT_NO_EMPTY).
This works almost perfectly, providing me with an excellent array output, until I add get variables. With these, the last array element includes the get variables too.
My question is, is there a way to stop processing as soon as a question mark (?) is reached?
I want to cut it from the question mark, and only show items before the question mark. This (hopefully) will mean that this:
Array
(
[0] => controller
[1] => method
[2] => parameter1
[3] => parameter2?parameter1=parameter2
)
Will become this:
Array
(
[0] => controller
[1] => method
[2] => parameter1
[3] => parameter2
)
The problem is, I want this all in the regular expression. I don't really care if there is another way (I know there is), I just want to know if there is a way to do this in the regex.
Thanks
Explode before split
$vars = explode('?', $_SERVER['REQUEST_URI']);
$array = preg_split('[/]', $vars[0], NULL, PREG_SPLIT_NO_EMPTY);
UPDATE
From php.net:
If you don't need the power of regular expressions, you can choose
faster (albeit simpler) alternatives like explode() or str_split().
In your case you can use str_split and save some time.
you can replace ?(.*) using:
preg_split('[/]',
preg_replace("/\?(.*)/", "", $_SERVER['REQUEST_URI']),
NULL, PREG_SPLIT_NO_EMPTY)
I do this in my system with strpos and substr before using regex:
$uri = $_SERVER['REQUEST_URI'];
$uri = ($pos = strpos($uri, '?')) ? substr($uri, 0, $pos) : $uri;

Change href based on URI using PHP

I'm trying to make a main menu bar link dynamic, based on the visitor's current page.
I started with
$path = $_SERVER['REQUEST_URI'];
Which, of course, returns things like
/subfolder/page.html
/subfolder1/subfolder2/page.html
/page.html
I need to grab whatever is after the first '/'. I've tried messing around with explode, but I stumble with what to do with the resulting array. I'm also going cross-eyed trying to write a regex - seems a more elegant solution.
Then I need to build my switch. Something along the lines of:
switch ($path)
{
case '/subfolder0':
$link = $root_url.'/subfolder0/anotherfolder/page.html';
break;
case '/subfolder1':
$link = $root_url.'/subfolder1/page.html';
break;
default:
$link = $root_url.'/subfolder2/page.html';
}
Finally, should I be using if...elseif for this in lieu of switch?
Thanks for your time, all!
To grab everything after the first /:
strstr($_SERVER['REQUEST_URI'], '/');
Or, with regex:
preg_match('#(/.*)#', $_SERVER['REQUEST_URI'], $matches); // $matches[1] will be the path
As far as the switch, I'd say if/elseif/else is the least-elegant in your case, switch isn't bad, but personally I'd go with an associative array:
$mapping = array('/subfolder0' => $root_url.'/subfolder0/anotherfolder/page.html', 'etc' => 'etc');
$link = $mapping($path);
This lets you keep the mapping in another file for organization, and makes it a little bit easier to maintain by separating configuration from implementation.
Using explode is not at all a bad idea if you are interested in all the parts of the URI, you should take a look at the documentation for explode
Its usage would be like so:
$exploded = explode('/','/path/to/page.html');
echo $exploded[0]; // Will print out '' (empty string)
echo $exploded[1]; // Will print out 'path'
echo $exploded[2]; // Will print out 'to'
echo $exploded[3]; // Will print out 'page.html'
However as far as I understand, you are looking to replace the link by whatever is after the first character (which is always '/'), you could use substr like so:
// Get whatever is after the first character and put it into $path
$path = substr($_SERVER['REQUEST_URI'], 1);
In your case, it is not needed because you are able to predict there is a backslash at the beginning of the string.
I would also suggest using an associative array to replace the URL.
I would implement the entire thing like so (removing the first backslash as you require):
// Define the URLs for replacement
$urls = array(
'subfolder0' => '/subfolder0/anotherfolder/page.html',
'subfolder1' => '/subfolder1/page.html'
);
// Get the request URI, trimming its first character (always '/')
$path = substr($_SERVER['REQUEST_URI'], 1);
// Set the link according to $urls associative array, or set
// the default URL if not found
$link = $urls[$path] or '/subfolder2/page.html';
Or with explode, taking only the first part of the URI:
// Get the parts of the request
$requestParts = explode('/', $_SERVER['REQUEST_URI']);
// Set the link according to $urls associative array, or set
// the default URL if not found
$link = $urls[$requestParts[1]] or '/subfolder2/page.html';
After analyzing the OP's question, I think he/she meant to phrase it as "Everything after the first '/', but before the second '/'. Here is what I got:
You could try this regex:
<?php
/*
* Regex: /((\w+?|\w+\.\w+?)(?!^\/))(?=\/.*$|$)/
*/
$paths = array(
'/subfolder9/',
'/subfolder/page.html',
'/subfolder1/subfolder2/page.html',
'/page.html'
);
foreach ($paths as $path) {
preg_match("/((\w+?|\w+\.\w+?)(?!^\/))(?=\/.*$|$)/", $path, $matches);
debug($matches);
}
// $matches[1] will contain the first group ( ) matched in the expression.
// or "subfolder<#>" or "<page>.<ext>"
// The loop results is as follows:
Array
(
[0] => subfolder9
[1] => subfolder9
[2] => subfolder9
)
Array
(
[0] => subfolder
[1] => subfolder
[2] => subfolder
)
Array
(
[0] => subfolder1
[1] => subfolder1
[2] => subfolder1
)
Array
(
[0] => page.html
[1] => page.html
[2] => page.html
)
?>
Note: This only works with regex flavors that support look-arounds (zero-width positive & negative look ahead are the ones used the example.)
This a great cheat sheet for regular expressions and I don't code without it.
Regular-Expressions.info - (click to view)
You can just use dirname function to get what you want:
$path = dirname('/subfolder/page.html'); // returns '/subfolder'
$path = dirname('/subfolder1/subfolder2/page.html'); // returns '/subfolder1'
$path = dirname('page.html'); // returns '.'
EDIT: Regex based solution:
$path = preg_replace('#^(/[^/]*).*$#', '$1', '/subfolder/page.html' )

named groups in PHP pcre regex

Trying to match string like this:
/2011/10/Lorem-ipsum-dolor-it-amet-consectetur-adipisicing
/2011/10/Lorem-ipsum-dolor-it-amet-consectetur-adipisicing/
and
/2011/10/4545
/2011/10/4545/
And get year, month and the third segment back. This is regex I've got:
%/(?P<year>\d{4})/(?P<month>\d{2})/((?P<id>\d{1,})|(?P<permalink>.{1,}))[/]{0,1}$%
I though resulting matches array will always contain 3 variables: year,month and id or permalink. But what happens - if permalink is matched - I also still get empty id variable in the resulting array anyway. Is there a way to rewrite a regex so resulting array will only contain year, month and id or permalink ?
I believe named groups aren't "ignored" when using the | syntax because there's no way of knowing whether you want to keep both of the results. In other words, both sides of | are evaluated even when one of them has or doesn't have a match, unlike conditional or in most programming languages.
As an example, if you have a regular expression
/(?P<foo>abc)|(?P<bar>def)/
and the string to compare against is abcdef, in some cases you'd want to know that both subexpressions matched and so both variables should be set. And if both variables are set in some cases, it's better to set them in all cases so that the programmer doesn't first have to check if they've been set before handling them.
And as a comment to the question "Is there a way to rewrite a regex so resulting array will only contain year, month and id or permalink", why would you want that? Just check if the variable is empty. If the regex would leave either of them out, you'd still need a check which of them is set. The exact same logic can be used to check which of them is empty.
Since they are present in the regex, the named groups will be always included in the match groups even if they did not match anything due to the |.
You may also want to improve the regex a bit, substituting the . in <permalink> with [^/] because you don't want a trailing slash (if present) as part of the permalink.
However, as Mob notes, there's a much easier way to parse such an easy target:
list($year, $target, $link) = array_slice(explode('/', $url), 1);
if (is_numeric($link)) {
// $link == id
}
else {
// $link == permalink
}
You don't necesarily need regex.
$x = "/2011/10/4545";
$v = explode("/", $x);
$r = array_shift($v);
if(count($v) == 4){
array_pop($v);
print_r($v); }
Outputs
Array
(
[0] => 2011
[1] => 10
[2] => 4545
$url = "/2011/10/Lorem-ipsum-dolor-it-amet-consectetur-adipisicing";
$v = explode("/", $url);
array_shift($v);
array_pop($v);
if(count($v) == 3){
array_pop($v);
print_r($v);
} else {
print_r($v); }
Outputs
Array
(
[0] => 2011
[1] => 10
)

Categories