RegExp to match a segment of a URL - php

I'm trying to use RegExp to match a segment of a URL.
The URL in question is this:
http://www.example.com/news/region/north-america/
As I need this regex for the WordPress URL Rewrite API, the subject will only be the path section of the URL:
news/region/north-america
In the above example I need to be able to extract the north-america portion of the path, however when pagination is used the path becomes something like this:
news/region/north-america/page/2
Where I still only need to extract the north-america portion.
The RegExp I've come up with is as follows:
^news/region/(.*?)/(.*?)?/?(.*?)?$
However this does not match for news/region/north-america only news/region/north-america/page/2
From what I can tell I need to make the trailing slash after north-america optional, but adding /? doesn't seem to work.

Try this:
preg_match('/news\/region\/(.*?)\//',"http://www.example.com/news/region/north-america/page/2",$matches);
the $matches[1] will give you the output. as "north-america".

You should match using this regex:
^news/region/([^/]+)
This will give you news/region/north-america even when URI becomes /news/region/north-america/page/2

georg's suggested rule work like a charm:
^news/region/(.*?)(?:/(.*?)/(.*?))?$
For those interested in the application of this regex, I used it in the WP Rewrite API to grab the custom taxonomy and page number (if present) and assign the relevant matches to the the WP re-write:
$newRules['news/region/(.?)(?:/(.?)/(.*?))?$']='index.php?region=$matches[1]&forcetemplate=news&paged=$matches[3]';

Related

PHP regex match excluded several urls

I have a trouble with urls match! For example:
we have urls(strings):
/, /news, /news/1-addsf, /articles, /guides etc.
and task: get match of all, excepts all starts with "/news" (with or w/o continue "/1-addsf", need both regexp's) AND "/articles" AND "/"
i try smth, like this:
#([^\/news.*]|[^\/articles])#is
#\/[^(news.*|articles)]#is
#^\/(^news|^articles)#is
and manymanymany other variants
i think, that i doesn't know smth or am bad googler, but i can't find smth for this question.
Need worked regexp! Thanks!
p.s. sorry for my english.
Seems like you want something like,
#^/(?!news|articles).*#is
The above regex matches all the url strings except the ones which starts with /news or /articles
DEMO

PHP preg_match , check if language is defined in url

I would like to test for a language match in a url.
Url will be like : http://www.domainname.com/en/#m=4&guid=%some_param%
I want to check if there is an existing language code within the url. I was thinking something between these lines :
^(.*:)\/\/([a-z\-.]+)(:[0-9]+)?(.*)$
or
^(http|https:)\/\/([a-z\-.]+)(:[0-9]+)?(.*)$
I'm not that sharp with regex. can anyone help or point me towards the right direction ?
[https]+://[a-z-]+.([a-z])+/
try this,
http://www.regexr.com/ this is a easy site for creating regex
If you know the data you are testing is a url then I would not bother adding all of the url parts to the regex. Keep it simple like: /\/[a-z]{2}\// That looks for a two letter combination between two forward slashes. If you need to capture the language code then wrap it in parentheses: /\/([a-z]{2})\//

Match exactly one URL segment with regex in Wordpress

I want to rewrite these URLs in Wordpress:
http://localhost/one/.../
http://localhost/one/...
Using the following code:
add_rewrite_tag('%my_test%','([^/]*)');
add_rewrite_rule(
'^one/([^/]*)/?',
'index.php?page_id=0&my_test=$matches[1]',
'top'
);
It works, but it also allows URLs like:
http://localhost/one/.../...
http://localhost/one/.../.../...
How can I rewrite only /one/.../ and /one/... URLs and return 404 for /one/.../.../ etc?
The '^one/([^/]*)/?', is matching /one/.../ and /one/... and /one/(nothing). Hovewer anything beyond that is ignored because the regex is not terminated. You need to add $ to the end. And you probably want to replace the * with a + if you don't want to match /one/(nothing) . So, '^one/([^/]+)/?$', should work.

Removing 'http://' from link via REGEX

What I would like to do is remove the "http://" part of these autogenerated links, below is an example of it.
http://google.com/search?gc...
Here are the regexes I am using in PHP to generate these links from a URL.
$patterns_sp[5] = '~([\S]+)~';
$replaces_sp[5] = '<a href=\1 target="_blank">\1<br/>';
$patterns_sp[6] = '~(?<=\>)([\S]{1,25})[^\s]+~';
$replaces_sp[6] = '\1...</a><br/>';
When these patterns are run on a URL like this:
http://www.google.com/search?gcx=c&ix=c1&sourceid=chrome&ie=UTF-8&q=regex
the REGEX gives me:
http://google.com/search?gc...
Where I am stuck:
There is no obvious reason why I cannot modify the fourth line of code to read like this:
$patterns_sp[6] = '~(?<=\>http\:\/\/)([\S]{1,25})[^\s]+~';
However, the REGEX still seems to capture the "http://" part of the address, thus making a long list of these very redundant looking. What I am left with is the same thing as in the first example.
Replace...
$patterns_sp[5] = '~([\S]+)~';
...with...
$patterns_sp[5] = '~^(?:https?|ftp):([\S]+)~';
Then you can access the protocol-less version with $1 and the whole link with $0.
Optionally, you can remove a leading protocol with something like...
preg_replace('/^(?:https?|ftp):/', '', $str);
I suggest not writing your own regex, instead have a look at http://php.net/manual/en/function.parse-url.php
Retrieve the components of the URL, then compose a new version that only contains the parts you want.

URL Beautification using .htaccess or php?

In search of a more userfriendly & search engine friendly urls, i want have beautied my urls:
The htacces apache rule that achieves this (Thanks to Laurence Gonsalves)
RewriteRule ^([a-z][a-z])/(.*) /$2?ln=$1 [L]
which makes this possible:
/uk/somepage instead of /somepage?ln=uk
/de/somepage instead of /somepage?ln=de
/ja/somepage instead of /somepage?ln=ja
Now the difficult part: previously, the url was replaced with a normal link like href="?ln=de" or href="?ln=it" for changing language of the current page. But now how can i achieve that? Sothat the current page stays the same, but only the preceding two lowercase letters that say to the browser what language it is in change?
So how to tell the link to only change the /uk/contact to /de/contact once the german (de) language flag is clicked? php solution to rewrite the url or htaccess solutions are accepted.
I found out that $_SERVER['REQUEST_URI'] will output /uk/somepage but i cant write the php code that can split up the components, add a new language code like "de" into it, which i can put manually into a normal href that goes on a German flag. etc. Thanks for any and all clues/answers!
You'd probably want to look at something like explode or regular expressions to strip out the non-language part of the URL (e.g., /contact) and just add it again to a new string containing the language identifier.
Maybe this could get you started:
<?php
function changeLanguageLink($language_id)
{
$uri = $_SERVER['REQUEST_URI'];
$link = preg_replace('/\/?(uk|de)\/(.*)/', "/$2", $uri);
$link = $language_id . $link;
return $link;
}
?>
Change language to UK
Well, you can split the request_uri using, well, split() or explode().
$uri_bits=explode('/', $_SERVER['REQUEST_URI']);
In theory the language identifier will be in $uri_bits[ 1] (as [0] would contain a zero length string, but you should test it by print_r()-ing the array). Of course, you should test if the $uri_bits[ 1] exists, and it's the language identifier, the simplest way to do it would be:
if($uri_bits[1]==$_GET['lang'])
Then you can change that and concatenate the bits again using implode()
$uri_bits[1]="it";
$url_german=implode('/', $uri_bits);
At least that's how I'd do it.

Categories