Regex: Select multiple lines between characters - php

I can't select multiple lines between two characters on regex.
how can i solve this?
{
example
example 1
}
I want to select 'example'. but i cant.
I tried this regex
#\n.*#
thank you

Your pattern does not do what you expect; it matches a newline character followed by any character except newline "zero or more" times. You need to use the s (dotall) modifier which forces the dot to match across newline sequences.
For example — matching everything between the two curly braces.
preg_match('/{(.*)}/s', $str, $match);
echo $match[1];

Related

Regex to get only characters without space inside special tags

I have 2 texts in a string:
%Juan%
%Juan Gonzalez%
And I want to only be able to get %Juan% and not the one with the Space, I have been trying several Regexes witout luck. I currently use:
/%(.*)%/U
but it gets both things, I tried adding and playing with [^\s] but it doesnt works.
Any help please?
The issue is that . matches any character but a newline. The /U ungreedy mode only makes .* lazy and it captures a text from the % up to the first % to the right of the first %.
If your strings contain one pair of %...%, you may use
/%(\S+)%/
See the regex demo
The \S+ pattern matches 1+ characters other than a whitespace, and the whole [^\h%] negated character class that matches any character but a horizontal space and % symbol.
If you have multiple %...% pairs, you may use
/%([^\h%]+)%/
See another regex demo, where \h matches any horizontal whitespace.
PHP demo:
$re = '/%([^\h%]+)%/';
$str = "%Juan%\n%Juan Gonzalez%";
preg_match_all($re, $str, $matches);
print_r($matches[1]);

Explode and/or regex text to HTML link in PHP

I have a database of texts that contains this kind of syntax in the middle of English sentences that I need to turn into HTML links using PHP
"text1(text1)":http://www.example.com/mypage
Notes:
text1 is always identical to the text in parenthesis
The whole string always have the quotation marks, parenthesis, colon, so the syntax is the same for each.
Sometimes there is a space at the end of the string, but other times there is a question mark or comma or other punctuation mark.
I need to turn these into basic links, like
text1
How do I do this? Do I need explode or regex or both?
"(.*?)\(\1\)":(.*\/[a-zA-Z0-9]+)(?=\?|\,|\.|$)
You can use this.
See Demo.
http://regex101.com/r/zF6xM2/2
You can use this replacement:
$pattern = '~"([^("]+)\(\1\)":(http://\S+)(?=[\s\pP]|\z)~';
$replacement = '\1';
$result = preg_replace($pattern, $replacement, $text);
pattern details:
([^("]+) this part will capture text1 in the group 1. The advantage of using a negated character class (that excludes the double quote and the opening parenthesis) is multiple:
it allows to use a greedy quantifier, that is faster
since the class excludes the opening parenthesis and is immediatly followed by a parenthesis in the pattern, if in an other part of the text there is content between double quotes but without parenthesis inside, the regex engine will not go backward to test other possibilities, it will skip this substring without backtracking. (This is because the PCRE regex engine converts automatically [^a]+a into [^a]++a before processing the string)
\S+ means all that is not a whitespace one or more times
(?=[\s\pP]|\z) is a lookahead assertion that checks that the url is followed by a whitespace, a punctuation character (\pP) or the end of the string.
You can use this regex:
"(.*?)\(.*?:(.*)
Working demo
An appropriate Regular Expression could be:
$str = '"text1(text1)":http://www.example.com/mypage';
preg_match('#^"([^\(]+)' .
'\(([^\)]+)\)[^"]*":(.+)#', $str, $m);
print ''.$m[2].'' . PHP_EOL;

exploding a string using a regular expression

I have a string as below (the letters in the example could be numbers or texts and could be either uppercase or lowercase or both. If a value is a sentence, it should be between single quotations):
$string="a,b,c,(d,e,f),g,'h, i j.',k";
How can I explode that to get the following result?
Array([0]=>"a",[1]=>"b",[2]=>"c",[3]=>"(d,e,f)",[4]=>"g",[5]=>"'h,i j'",[6]=>"k")
I think using regular expressions will be a fast as well as clean solution. Any idea?
EDIT:
This is what I have done so far, which is very slow for the strings having a long part between parenthesis:
$separator="*"; // whatever which is not used in the string
$Pattern="'[^,]([^']+),([^']+)[^,]'";
while(ereg($Pattern,$String,$Regs)){
$String=ereg_replace($Pattern,"'\\1$separator\\2'",$String);
}
$Pattern="\(([^(^']+),([^)^']+)\)";
while(ereg($Pattern,$String,$Regs)){
$String=ereg_replace($Pattern,"(\\1$separator\\2)",$String);
}
return $String;
This, will replace all the commas between the parenthesis. Then I can explode it by commas and the replace the $separator with the original comma.
You can do the job using preg_match_all
$string="a,b,c,(d,e,f),g,'h, i j.',k";
preg_match_all("~'[^']+'|\([^)]+\)|[^,]+~", $string, $result);
print_r($result[0]);
Explanation:
The trick is to match parenthesis before the ,
~ Pattern delimiter
'
[^'] All charaters but not a single quote
+ one or more times
'
| or
\([^)]+\) the same with parenthesis
| or
[^,]+ Any characters except commas one or more times
~
Note that the quantifiers in [^']+', in [^)]+\) but also in [^,]+ are all automatically optimized to possessive quantifiers at compile time due to "auto-possessification". The first two because the character class doesn't contain the next character, and the last because it is at the end of the pattern. In both cases, an eventual backtracking is unnecessary.
if you have more than one delimiter like quotes (that are the same for open and close), you can write your pattern like this, using a capture group:
$string="a,b,c,(d,e,f),g,'h, i j.',k,°l,m°,#o,p#,#q,r#,s";
preg_match_all('~([\'##°]).*?\1|\([^)]+\)|[^,]+~', $string, $result);
print_r($result[0]);
explanation:
(['##°]) one character in the class is captured in group 1
.*? any character zero or more time in lazy mode
\1 group 1 content
With nested parenthesis:
$string="a,b,(c,(d,(e),f),t),g,'h, i j.',k,°l,m°,#o,p#,#q,r#,s";
preg_match_all('~([\'##°]).*?\1|(\((?:[^()]+|(?-1))*+\))|[^,]+~', $string, $result);
print_r($result[0]);

Regex to remove single characters from string

Consider the following strings
breaking out a of a simple prison
this is b moving up
following me is x times better
All strings are lowercased already. I would like to remove any "loose" a-z characters, resulting in:
breaking out of simple prison
this is moving up
following me is times better
Is this possible with a single regex in php?
$str = "breaking out a of a simple prison
this is b moving up
following me is x times better";
$res = preg_replace("#\\b[a-z]\\b ?#i", "", $str);
echo $res;
How about:
preg_replace('/(^|\s)[a-z](\s|$)/', '$1', $string);
Note this also catches single characters that are at the beginning or end of the string, but not single characters that are adjacent to punctuation (they must be surrounded by whitespace).
If you also want to remove characters immediately before punctuation (e.g. 'the x.'), then this should work properly in most (English) cases:
preg_replace('/(^|\s)[a-z]\b/', '$1', $string);
As a one-liner:
$result = preg_replace('/\s\p{Ll}\b|\b\p{Ll}\s/u', '', $subject);
This matches a single lowercase letter (\p{Ll}) which is preceded or followed by whitespace (\s), removing both. The word boundaries (\b) ensure that only single letters are indeed matched. The /u modifier makes the regex Unicode-aware.
The result: A single letter surrounded by spaces on both sides is reduced to a single space. A single letter preceded by whitespace but not followed by whitespace is removed completely, as is a single letter only followed but not preceded by whitespace.
So
This a is my test sentence a. o How funny (what a coincidence a) this is!
is changed to
This is my test sentence. How funny (what coincidence) this is!
You could try something like this:
preg_replace('/\b\S\s\b/', "", $subject);
This is what it means:
\b # Assert position at a word boundary
\S # Match a single character that is a “non-whitespace character”
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
\b # Assert position at a word boundary
Update
As raised by Radu, because I've used the \S this will match more than just a-zA-Z. It will also match 0-9_. Normally, it would match a lot more than that, but because it's preceded by \b, it can only match word characters.
As mentioned in the comments by Tim Pietzcker, be aware that this won't work if your subject string needs to remove single characters that are followed by non word characters like test a (hello). It will also fall over if there are extra spaces after the single character like this
test a hello
but you could fix that by changing the expression to \b\S\s*\b
Try this one:
$sString = preg_replace("#\b[a-z]{1}\b#m", ' ', $sString);

php separating words by using regex

I have a string and it contains some words that I want to reach, seperators can be any string that consist of , ; or a space.
Here is a example:
;,osman,ali;, mehmet ;ahmet,ayse; ,
I need to take words osman ali mehmet ahmet and ayse to an array or any type that I can use them one by one. I tried it by using preg function but i couldn't figure out.
If anyone help, I will be appreciative.
$words = preg_split('/[,;\s]+/', $str, -1, PREG_SPLIT_NO_EMPTY);
[,;\s] is a character group which means match any of the characters contained in this group.
\s matches any white space character (space, tab, newline, etc.). If this is too much just replace it with a space: [,; ].
+ means match one or more of the preceding symbol or group.
DEMO
http://www.regular-expressions.info/ is a good site to learn regular expressions.
You want to use preg_split and use [;, ]+ for your regex to split on
$keywords = preg_split("/[;, ]+/", $yourstring);
Split on non-word characters:
$array=preg_split("/\W+/", $string);

Categories