RegExp Language Routing - php

Good Day,
I am in need of a Route-RegExp based on client-language for a website.
It should be like this:
Relative URL / Route:
/(No-Language) -> /?lng=(someDefaultLanguage)
/(No-Language)/ -> /?lng=(someDefaultLanguage)
/lngCode/page -> /page/?lng=lngCode
/lngCode/page/ -> /page/?lng=lngCode
/lngCode/pageL1/pageL2 -> /pageL1/pageL2/?lng=lngCode
/language/page?param=Value -> /page/?lng=lngCode&param=Value
(Notice the trailing slashes on some lines)
Tree structure is, ...well infinite :)
There are cases with single and multiple URL-Params.
I'm absolute no regex wizard, I managed this result in uhm, ...hours:
/^\/([a-z]{2})(?:(.*[^\?])|^$)((?:[\/\?]).*|^$)/
Please don't ask me what I was trying to route there. I am sooooo new to regex.
Thank you in advance
--
Edit for clarification (I hope):
Basically it is this concept: (It is internal routing, no redirection if I didnt mention.)
The language-parameter (as directory-style) must be grabbed from the 1st url and attached as a real parameter named, "lng". The directory-parameter should disappear.
If there are already other parameters, they need to be attached as well (?/&-case).
If there is no language given (=default-language), there is no directory-style-parameter in the url. Would be nice if still a ?lng=en parameter can be attached.
Examples:
localhost/blogpage/coolentry (default language)
localhost/de/blogpage/coolentry
localhost/es/blogpage/coolentry
localhost/blogpage/ -> localhost/blogpage/?lng=en
localhostde/de/blogpage/ -> localhost/blogpage/?lng=de
localhost/blogpage/coolentry/ -> localhost/blogpage/coolentry/?lng=en
localhost/de/blogpage/coolentry/ -> localhost/blogpage/coolentry/?lng=de
localhost/de/blogpage/coolentry/?entryPage=1 -> localhost/blogpage/coolentry/?lng=de&entryPage=1
It gets routed always with a real language parameter.
I have as well edited the first post, there was a confusing typo in it.

Sorry for the delay #hcm.
Well, I gotta tell ya. I spent 10 minutes writing the original regex.
Tested in Perl, everything worked great.
Then I go to dump it into an online php tester and I get this "Undefined offset" error/warning.
Capture groups 2,3,4 are optional, so I do a (?: ( capture ) )? but php is such a mess
you can't even test for undefined group.
Update
#hcm - Ok, figured it out. Those online testers don't translate CRLF's to LF's.
Therefore, when using multiline mode $ is boundry before a newline or end of string.
^ is after a newline or beginning of a string, which is no problem.
So, $ won't match before a CR only before a LF. A workaround is probably using the
\R any linebreak construct but that is not a boundry, its an actual character.
What I did was to cure this is to use (?: $ | (?= \r) ) outside of assertions, and
(?: $ | \r ) inside assertions. This cures all problems.
After reading your message, I've changed the regex so that every part is optional, but
still positional.
The 4 optional parts are as follows.
1. Before the lang code.
2. The lang code.
3. After the lang code.
4. The parameters.
No part will run over the other.
All leading /'s are taken out of each part (not part of the match),
while internal slashes are left in place.
All that's left is to construct the new url as you wish.
Let me know how this turns out or if you need a little tweak.
PHP code:
// Go to this website:
// http://writecodeonline.com/php/
// Cut & paste this into the code box, hit run.
$text = '
invalid
/
/de
/de/coolentry
localhost/
localhost/blogpage
localhost/blogpage/
localhost/blogpage/de/
/localhost/blogpage/coolentry/famous invalid
/root/blog/page/cool/entry/?entryPage=1&var1=A&var2=B
localhost/blogpage/coolentry/
localhost/blogpage/de/coolentry/
localhost/blogpage/de/coolentry/
localhost/blogpage/de/coolentry/?entryPage=1
localhost/blogpage/coolentry/?entryPage=2
';
$str = preg_replace_callback('~^(?![^\S\r\n]*(?:\r|$))(?|(?!/[a-z]{2}[^\S\r\n]*(?:/|(?:\r|$)))/?((?:(?!/[a-z]{2}[^\S\r\n]*(?:/|(?:\r|$))|\?|/[^\S\r\n]*(?:\r|$))\S)*)|())(?|/([a-z]{2})(?=/|[^\S\r\n]*(?:\r|$))|())(?|/((?:(?!/[^\S\r\n]*(?:\r|$))[^?\s])+)|())(?|/\?((?:(?!/[^\S\r\n]*(?:\r|$))\S)*)|())/?[^\S\r\n]*(?:$|(?=\r))~m',
function( $matches )
{
///////////////// URL //////////////////
$url = '';
// Before lang code -- Group 1
if ( $matches[1] != '' ) {
$url .= '/' . $matches[1];
}
// After lang code -- Group 3
if ( $matches[3] != '' ) {
$url .= '/' . $matches[3];
}
///////////////// PARAMS //////////////////
$params = '/?lng=';
// Lang code -- Group 2
if ( $matches[2] != '' ) {
$params .= $matches[2];
}
else {
$params .= 'en'; // No lang given, set default
}
// Other params
if ( $matches[4] != '') {
$params .= '&' . $matches[4];
}
///////////////// Check there is a Url //////////////////
if ( $url == '' ) { // No url given, set a default
$url = '/language'; // 'language', 'localhost', etc...
}
///////////////// Put the pieces back together //////////////////
$NewURL = $url . $params;
return $NewURL;
},
$text);
print $str;
output:
invalid
/language/?lng=en
/language/?lng=de
/coolentry/?lng=de
/localhost/?lng=en
/localhost/blogpage/?lng=en
/localhost/blogpage/?lng=en
/localhost/blogpage/?lng=de
/localhost/blogpage/coolentry/famous invalid
/root/blog/page/cool/entry/?lng=en&entryPage=1&var1=A&var2=B
/localhost/blogpage/coolentry/?lng=en
/localhost/blogpage/coolentry/?lng=de
/localhost/blogpage/coolentry/?lng=de
/localhost/blogpage/coolentry/?lng=de&entryPage=1
/localhost/blogpage/coolentry/?lng=en&entryPage=2
Regex
# '~^(?![^\S\r\n]*(?:\r|$))(?|(?!/[a-z]{2}[^\S\r\n]*(?:/|(?:\r|$)))/?((?:(?!/[a-z]{2}[^\S\r\n]*(?:/|(?:\r|$))|\?|/[^\S\r\n]*(?:\r|$))\S)*)|())(?|/([a-z]{2})(?=/|[^\S\r\n]*(?:\r|$))|())(?|/((?:(?!/[^\S\r\n]*(?:\r|$))[^?\s])+)|())(?|/\?((?:(?!/[^\S\r\n]*(?:\r|$))\S)*)|())/?[^\S\r\n]*(?:$|(?=\r))~m'
^ # BOL
(?! # Not a blank line, remove to generate a total default url
[^\S\r\n]*
(?: \r | $ )
)
(?| # BEFORE lang code
(?!
/ [a-z]{2} [^\S\r\n]* # not lang code
(?:
/
| (?: \r | $ )
)
)
/? # strip leading '/'
( # (1 start)
(?:
(?!
/ [a-z]{2} [^\S\r\n]* # not lang code
(?:
/
| (?: \r | $ )
)
|
\? # not parms
|
/ [^\S\r\n]* # not final slash
(?: \r | $ )
)
\S
)*
) # (1 end)
|
( ) # (1)
)
(?| # LANG CODE
/ # strip leading '/'
( [a-z]{2} ) # (2)
(?=
/
| [^\S\r\n]*
(?: \r | $ )
)
|
( ) # (2)
)
(?| # AFTER lang code
/ # strip leading '/'
( # (3 start)
(?:
(?! # not final slash
/ [^\S\r\n]*
(?: \r | $ )
)
[^?\s] # not parms
)+
) # (3 end)
|
( ) # (3)
)
(?| # PARAMETERS
/ \? # strip leading '/?'
( # (4 start)
(?:
(?! # not final slash
/ [^\S\r\n]*
(?: \r | $ )
)
\S
)*
) # (4 end)
|
( ) # (4)
)
/?
[^\S\r\n]* # EOL
(?:
$
| (?= \r )
)

Related

preg_split shortcode attributes into array

I would like to parse shortcode into array via "preg_split".
This is example shortcode:
[contactform id="8411" label="This is \" first label" label2='This is second \' label']
and this should be result array:
Array
(
[id] => 8411
[label] => This is \" first label
[label2] => This is second \' label
)
I have this regexp:
$atts_arr = preg_split('~\s+(?=(?:[^\'"]*[\'"][^\'"]*[\'"])*[^\'"]*$)~', trim($shortcode, '[]'));
Unfortunately, this works only if there is no escaping of quotes \' or \".
Thx in advance!
Using preg_split is not always handy or appropriate in particular when you have to deal with escaped quotes. So, a better approach consists to use preg_match_all, example:
$pattern = <<<'EOD'
~
(\w+) \s*=
(?|
\s* "([^"\\]*(?:\\.[^"\\]*)*)"
|
\s* '([^'\\]*(?:\\.[^'\\]*)*)'
# | uncomment if you want to handle unquoted attributes
# ([^]\s]*)
)
~xs
EOD;
if (preg_match_all($pattern, $yourshortcode, $matches))
$attributes = array_combine($matches[1], $matches[2]);
The pattern uses the branch reset feature (?|...(..)...|...(...)..) that gives the same number(s) to the capture groups for each branch.
I was speaking about the \G anchor in my comment, this anchor succeeds if the current position is immediatly after the last match. It can be useful if you want to check the syntax of your shortcode from start to end at the same time (otherwise it is totally useless). Example:
$pattern2 = <<<'EOD'
~
(?:
\G(?!\A) # anchor for the position after the last match
# it ensures that all matches are contiguous
|
\[(?<tagName>\w+) # begining of the shortcode
)
\s+
(?<key>\w+) \s*=
(?|
\s* "(?<value>[^"\\]*(?:\\.[^"\\]*)*)"
|
\s* '([^'\\]*(?:\\.[^'\\]*)*')
# | uncomment if you want to handle unquoted attributes
# ([^]\s]*)
)
(?<end>\s*+]\z)? # check that the end has been reached
~xs
EOD;
if (preg_match_all($pattern2, $yourshortcode, $matches) && isset($matches['end']))
$attributes = array_combine($matches['key'], $matches['value']);

PHP: How do I linkify all links inside a given text?

I am using the tool https://github.com/jmrware/LinkifyURL to detect URLs in a text unit. Unfortunately, it only recognizes one URL in the whole text. For example, if the text ought to be:
http://www.guiageo-americas.com/imagens/imagem-america-do-sul.jpg I
really think this should be working
http://www.youtube.com/watch?v=Cy8duEIHEig more text and some writing
here and there
what appears is:
http://www.guiageo-americas.com/imagens/imagem-america-do-sul.jpg I
really think this should be working
http://www.youtube.com/watch?v=Cy8duEIHEig more text and some writing
here and there
and what I want is:
http://www.guiageo-americas.com/imagens/imagem-america-do-sul.jpg I
really think this should be working
http://www.youtube.com/watch?v=Cy8duEIHEig more text and some writing
here and there
Any idea on why? Of course, I'll leave the PHP code here:
function linkify($text) {
/* $text being "http://www.guiageo-americas.com/imagens/imagem-america-do-sul.jpg I really think this should be working http://www.youtube.com/watch?v=Cy8duEIHEig more text and some writing here and there" */
$url_pattern = '/# Rev:20100913_0900 github.com\/jmrware\/LinkifyURL
# Match http & ftp URL that is not already linkified.
# Alternative 1: URL delimited by (parentheses).
(\() # $1 "(" start delimiter.
((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]#%]+) # $2: URL.
(\)) # $3: ")" end delimiter.
| # Alternative 2: URL delimited by [square brackets].
(\[) # $4: "[" start delimiter.
((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]#%]+) # $5: URL.
(\]) # $6: "]" end delimiter.
| # Alternative 3: URL delimited by {curly braces}.
(\{) # $7: "{" start delimiter.
((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]#%]+) # $8: URL.
(\}) # $9: "}" end delimiter.
| # Alternative 4: URL delimited by <angle brackets>.
(<|&(?:lt|\#60|\#x3c);) # $10: "<" start delimiter (or HTML entity).
((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]#%]+) # $11: URL.
(>|&(?:gt|\#62|\#x3e);) # $12: ">" end delimiter (or HTML entity).
| # Alternative 5: URL not delimited by (), [], {} or <>.
( # $13: Prefix proving URL not already linked.
(?: ^ # Can be a beginning of line or string, or
| [^=\s\'"\]] # a non-"=", non-quote, non-"]", followed by
) \s*[\'"]? # optional whitespace and optional quote;
| [^=\s]\s+ # or... a non-equals sign followed by whitespace.
) # End $13. Non-prelinkified-proof prefix.
( \b # $14: Other non-delimited URL.
(?:ht|f)tps?:\/\/ # Required literal http, https, ftp or ftps prefix.
[a-z0-9\-._~!$\'()*+,;=:\/?#[\]#%]+ # All URI chars except "&" (normal*).
(?: # Either on a "&" or at the end of URI.
(?! # Allow a "&" char only if not start of an...
&(?:gt|\#0*62|\#x0*3e); # HTML ">" entity, or
| &(?:amp|apos|quot|\#0*3[49]|\#x0*2[27]); # a [&\'"] entity if
[.!&\',:?;]? # followed by optional punctuation then
(?:[^a-z0-9\-._~!$&\'()*+,;=:\/?#[\]#%]|$) # a non-URI char or EOS.
) & # If neg-assertion true, match "&" (special).
[a-z0-9\-._~!$\'()*+,;=:\/?#[\]#%]* # More non-& URI chars (normal*).
)* # Unroll-the-loop (special normal*)*.
[a-z0-9\-_~$()*+=\/#[\]#%] # Last char can\'t be [.!&\',;:?]
) # End $14. Other non-delimited URL.
/imx';
//below goes my code
$url_replace = '$1$4$7$10$13<a style="color:blue;" onclick="toogleIframe(this)">$2$5$8$11$14</a>$3$6$9$12';
//echo preg_replace($url_pattern, $url_replace, $text);
return preg_replace($url_pattern, $url_replace, $text);
}
That's the kind of thing best left to a 3rd party library (which you're doing, so kudos). I'd recommend trying another one before you roll your own. purl is an excellent alternative.
You can use the following to replace all matches of your regex (though, I won't count on its performance):
while (preg_match($pattern, $string)) {
$string = preg_replace($pattern, $replacement, $string);
}
So, your function will become:
function linkify($text) {
$url_pattern = "<your-pattern-string">;
$url_replace = "<your-replacement-string">;
while (preg_match($url_pattern, $url_replace, $text) {
$text = preg_replace($url_pattern, $url_replace, $text);
}
return $text;
}

php regex: Use quotes for match, but don't capture them

I'm unsure if I should be using preg_match, preg_match_all, or preg_split with delim capture. I'm also unsure of the correct regex.
Given the following:
$string = " ok 'that\\'s cool' \"yeah that's \\\"cool\\\"\"";
I want to get an array with the following elems:
[0] = "ok"
[1] = "that\'s"
[2] = "yeah that's \"cool\""
You can not do this with a regular expression because you're trying to parse a non-context-free grammar. Write a parser.
Outline:
read character by character, if you see a \ remember it.
if you see a " or ' check if the previous character was \. You now have your delimiting condition.
record all the tokens in this manner
Your desired result set seems to trim spaces, you also lost a couple of the \s, perhaps this is a mistake but it can be important.
I would expect:
[0] = " ok " // <-- spaces here
[1] = "that\\'s cool"
[2] = " \"yeah that's \\\"cool\\\"\"" // leading space here, and \" remains
Actually, you might be surprised to find that you can do this in regex:
preg_match_all("((?|\"((?:\\\\.|[^\"])+)\"|'((?:\\\\.|[^'])+)'|(\w+)))",$string,$m);
The desired result array will be in $m[1].
You can do it with a regex:
$pattern = <<<'LOD'
~
(?J)
# Definitions #
(?(DEFINE)
(?<ens> (?> \\{2} )+ ) # even number of backslashes
(?<sqc> (?> [^\s'\\]++ | \s++ (?!'|$) | \g<ens> | \\ '?+ )+ ) # single quotes content
(?<dqc> (?> [^\s"\\]++ | \s++ (?!"|$) | \g<ens> | \\ "?+ )+ ) # double quotes content
(?<con> (?> [^\s"'\\]++ | \s++ (?!["']|$) | \g<ens> | \\ ["']?+ )+ ) # content
)
# Pattern #
\s*+ (?<res> \g<con>)
| ' \s*+ (?<res> \g<sqc>) \s*+ '?+
| " \s*+ (?<res> \g<dqc>) \s*+ "?+
~x
LOD;
$subject = " ok 'that\\'s cool' \"yeah that's \\\"cool\\\"\"";
preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
foreach($matches as $match) {
var_dump($match['res']);
}
I made the choice to trim spaces in all results, then " abcd " will give abcd. This pattern allows all backslashes you want, anywhere you want. If a quoted string is not closed at the end of the string, the end of the string is considered as the closing quote (this is why i have made the closing quotes optional). So, abcd " ef'gh will give you abcd and ef'gh

how to translate math espression in a string to integer

For example I have a statement:
$var = '2*2-3+8'; //variable type is string
How to make it to be equal 9 ?
From this page, a very awesome (simple) calculation validation regular expression, written by Richard van Velzen. Once you have that, and it matches, you can rest assured that you can use eval over the string. Always make sure the input is validated before using eval!
<?php
$regex = '{
\A # the absolute beginning of the string
\h* # optional horizontal whitespace
( # start of group 1 (this is called recursively)
(?:
\( # literal (
\h*
[-+]? # optionally prefixed by + or -
\h*
# A number
(?: \d* \. \d+ | \d+ \. \d* | \d+) (?: [eE] [+-]? \d+ )?
(?:
\h*
[-+*/] # an operator
\h*
(?1) # recursive call to the first pattern.
)?
\h*
\) # closing )
| # or: just one number
\h*
[-+]?
\h*
(?: \d* \. \d+ | \d+ \. \d* | \d+) (?: [eE] [+-]? \d+ )?
)
# and the rest, of course.
(?:
\h*
[-+*/]
\h*
(?1)
)?
)
\h*
\z # the absolute ending of the string.
}x';
$var = '2*2-3+8';
if( 0 !== preg_match( $regex, $var ) ) {
$answer = eval( 'return ' . $var . ';' );
echo $answer;
}
else {
echo "Invalid calculation.";
}
What you have to do is find or write a parser function that can properly read equations and actually calculate the outcome. In a lot of languages this can be implemented by use of a Stack, you should have to look at things like postfix and infix parsers and the like.
Hope this helps.
$string_with_expression = '2+2';
eval('$eval_result = ' . $string_with_expression)`;
$eval_result - is what you need.
There is intval function
But you can't apply direct to $var
For parser Check this Answer

Negation of a string in a regex

I realise that something similar has been asked before, but I can't seem to fit the solution to what I am trying to do, so please don't just think this is a dupe.
I have a string in the style {block:string}contents{/block:string}, which can be matched fairly easily with {block:([a-z_-\s]+)}.*{/block:\1}
What I want to do is modify the inner .* part so that it does not match any string that has a {block:[a-z_-\s]+} between it, that is all {block}{/block} that have a {block} inside them should not be matched.
Thanks!
Try
{block:([a-z_-\s]+)}[^{]*(?!{block:([a-z_-\s]+)}.*{\block:\2})[^}]*{/block:\1}
I am pretty mediocre at regex, but the negative lookahead bounded by the [^{]* and [^}]* statements should keep your matches tag-free.
Compressed: m~\{block:([a-z\s_-]+)\}(?:(?!\{/?block:\1\}).)*\{/block:\1\}~xs
Example in Perl:
$_ = '{block:string}conte{block:string}nts{/block:string}{/block:string}';
if ( m~ # match operator
\{block: ([a-z\s_-]+) \} # opening block structure and capt grp 1
(?: # begin non capt grp
(?! \{/?block: \1 \} ) # negative lookahead, don't want backreffed
# open or closed block struct
. # ok, grab this character
)* # end group, do 0 or more times (greedy)
\{/block: \1 \} # closing block structure matching grp 1
~xs ) # modifiers: expanded, include newlines
{
print "matched '$&'\n";
}
Output:
matched '{block:string}nts{/block:string}'
<?php
$ptn = "%(?:{block:[a-z_\s-]+})(?![^}]*?{block:).*?{/block:[a-z_\s-]+}%";
$str = "... your content here ...";
preg_match_all($ptn, $str, $matches);
print_r($matches);
?>
For example:
$str = "{block:string}test2{/block:string} {block:string}contents{block:string}{block:string}test3{/block:string}{/block:string}{/block:string} sdf ";
Would produce:
Array
(
[0] => Array
(
[0] => {block:string}test2{/block:string}
[1] => {block:string}test3{/block:string}
)
)

Categories