preg_split every character, but don't split if is quote [duplicate] - php

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
I'm using the following code to split my UTF-8 strings to characters:
$characters = preg_split('//u', $word, -1, PREG_SPLIT_NO_EMPTY);
In some cases, a character might have a single quote after it. for example: hel'lo. I want to keep that quote with the character before it.
Using the regex above, my array is this:
Array
(
[0] => h
[1] => e
[2] => l
[3] => '
[4] => l
[5] => o
)
And I want the array to be:
Array
(
[0] => h
[1] => e
[2] => l'
[3] => l
[4] => o
)
How can I do it?
Thanks!
(the single quote can be at the beginning of the string, at the end of it and in the middle of it).

Rather than split, you can do preg_match_all using
'?\p{L}'?
i.e. an optional ' before and after the Unicode letter:
preg_match_all("/'?\\p{L}'?/u", $str, $matches);
RegEx Demo

Use ! to prevent from split
$characters = preg_split("/(?!')/u", $word, -1, PREG_SPLIT_NO_EMPTY);

Related

preg_split to seperate input [duplicate]

This question already has answers here:
Split string on spaces except words in quotes
(4 answers)
Closed 3 years ago.
I'm building a website using PHP.
I am using a preg_split() to separate a given string which looks like +word1 -word2 -"word word".
But I need them in the following form +word1, -word2, -"word word".
Currently, I have this one:
$words = preg_split("/[\s\"]*\"([^\"]+)\"[\s\"]*|" . "[\s\"]*'([^']+)'[\s\"]*|" . "[\s\"]+/", $search_expression, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
but it didn't work as I wish: I need to do it in this way to get it work:
+word1 -word2 '-"word word"'.
Does someone have a better regex or idea?
One option is to match from a double quote till a double quote and don't split on that match using SKIP FAIL. Then match 1+ horizontal whitespace chars to split on.
"[^"]+"(*SKIP)(*FAIL)|\h+
Regex demo | Php demo
For example
$search_expression = '+word1 -word2 -"word word"';
$words = preg_split("~\"[^\"]+\"(*SKIP)(*FAIL)|\h+~", $search_expression);
print_r($words);
Output
Array
(
[0] => +word1
[1] => -word2
[2] => -"word word"
)
A simpler expression with greedy ? works for matching your examples:
preg_match_all('/[+-][^+-]+ ?/', $search_expression, $matches);
print_r($matches[0]);
Yields:
Array
(
[0] => +word1
[1] => -word2
[2] => -"word word"
)
Se Example.

Regex to Match Passed Function/Method Parameters

I've had a good look around for a question that asked this before; alas, my search for a PHP preg_match search returned no results (maybe my searching skills fell short, I suppose justified considering it's a Regex question!).
Consider the text below:
The quick __("brown ") fox jumps __('over the') lazy __("dog")
Now currently I need to 'scan' for the given method __('') above, whereas it could include the spacing and different quotations ('|"). My best attempt after numerous 'iterations':
(__\("(.*?)"\))|(__\('(.*?)'\))
Or at its simplest form:
__\((.*?)\)
To break this down:
Anything that starts with __
Escaped ( and quotation mark " or '. Thus, \(\"
(.*?) Non-greedy match of all characters
Escaped closing " and last bracket.
| between the two expressions match either/or.
However, this only gets partial matches, and spaces are throwing off the search entirely. Apologies if this has been asked before, please link me if so!
Tester Link for the pattern provided above:
PHP Live Regex Test Tool
When the searched method string uses single quotes it will end up in another capture group than if it has double quotes. So in fact, your regular expression works (except for the spaces, see further down), but you'd have to look at a different index in your result array:
$input = 'The quick __("brown ") fox jumps __(\'over the\') lazy __("dog")';
// using your regular expression:
$res = preg_match_all("/(__\(\"(.*?)\"\))|(__\('(.*?)'\))/", $input, $matches);
print_r ($matches);
Note that you need preg_match_all instead of preg_match to get all matches.
Output:
Array
(
[0] => Array
(
[0] => __("brown ")
[1] => __('over the')
[2] => __("dog")
)
[1] => Array
(
[0] => __("brown ")
[1] =>
[2] => __("dog")
)
[2] => Array
(
[0] => brown
[1] =>
[2] => dog
)
[3] => Array
(
[0] =>
[1] => __('over the')
[2] =>
)
[4] => Array
(
[0] =>
[1] => over the
[2] =>
)
)
So, the result array has 5 elements, the first one representing the complete match, and all the others correspond to the 4 capture groups you have in your regular expression. As the capture groups for single quotes are not those of the double quotes, you'll find the matches at different places.
To "solve" this, you could use a back reference in your regular expression, which would look back to see which was the opening quote (single or double) and require the same to be repeated at the end:
$res = preg_match_all("/__\(([\"'])(.*?)\\1\)/", $input, $matches);
Note the back reference \1 (the backslash had to be escaped with another one). This refers back to the first capture group, where we have ["'] (again an escape was necessary) to match both kinds of quotes.
You also wanted to deal with spaces. On your PHP Live Regex you used a test string that had such spaces between the brackets and quotes. To deal with these so they still match the method strings correctly, the regular expression should get two additional \s*:
$res = preg_match_all("/__\(\s*([\"'])(.*?)\\1\s*\)/", $input, $matches);
Now the output is:
Array
(
[0] => Array
(
[0] => __("brown ")
[1] => __('over the')
[2] => __("dog")
)
[1] => Array
(
[0] => "
[1] => '
[2] => "
)
[2] => Array
(
[0] => brown
[1] => over the
[2] => dog
)
)
... and the text captured by the groups is now nicely arranged.
See this code run on eval.in and PHP Live Regex.
When working with stuff like this, don't forget about escaping:
<?php
ob_start();
?>
The quick __("brown ") fox jumps __( 'over the' ) lazy __("dog").
And __("everyone says \"hi\"").
<?php
$content = ob_get_clean();
$re = <<<RE
/__ \(
\s*
" ( (?: \\\\. | [^"])+ ) "
|
' ( (?: \\\\. | [^'])+ ) '
\s*
\)
/x
RE;
preg_match_all($re, $content, $matches, PREG_SET_ORDER);
foreach($matches as $match)
echo end($match), "\n";
How about this:
(__(\('[^']+'\)|\("[^"]+"\)))
Instead of the non greedy ., use any char but the quotes [^'] or [^"]
Enclose double and single quotes with square brackets as a character class:
$str = 'The quick __( "brown ") fox jumps __(\'over the\') lazy __("dog")';
preg_match_all("/__\(\s*([\"']).*?\\1\s*\)/ium", $str, $matches);
echo '<pre>';
var_dump($matches[0]);
// the output:
array (size=3)
0 => string '__( "brown ")'
1 => string '__('over the')'
2 => string '__("dog")'
And here is example with the same solution on phpliveregex.com:
http://www.phpliveregex.com/p/exF
(section preg_match_all)

Sscanf with regex to match even an empty string

I am using this regex in sscanf
sscanf($seat, "%d-%[^(](%[^#]#%[^)])");
And it works well when i'm getting this kind of strings:
173-9B(AA#3.45 EUR#32H)
but when i'm getting this kind of string:
173-9B(#3.14 EUR#32H)
it's all messed up, how can I also accept empty strings between the first ( and the first # ?
You would be better off using a regex in preg_match to handle optional data presence in input:
$re = '/(\d*)-([^(]*)\(([^#]*)#([^)]*)\)/';
preg_match($re, '173-9B(#3.45 EUR#32H)', $m);
unset($m[0]);
print_r($m);
Output:
Array
(
[1] => 173
[2] => 9B
[3] =>
[4] => 3.45 EUR#32H
)
And 2nd example:
preg_match($re, '173-9B(AA#3.45 EUR#32H)', $m);
unset($m[0]);
print_r($m);
Array
(
[1] => 173
[2] => 9B
[3] => AA
[4] => 3.45 EUR#32H
)
Use of ([^#]*) will make it match 0 more characters that are not #.

php preg_match character special characters ([ ] ( ) { } etc)

im new to preg_match, i know this character [ ] has meaning in preg_match, but how do I actually treat it as character that i really want to match?
For example:
$word = '[Hello], Im steve';
preg_match_all('/[Hello]/', $word, $match);
print_r($match)
Output:
Array ( [0] => Array ( [0] => H [1] => e [2] => l [3] => l [4] => o [5] => e [6] => e ) )
The above statement didnt match and return the real '[' and ']'
How to overcome this?
Just escape it with a backslash \
preg_match_all('/\[Hello\]/', $word, $match);
print_r($match);
UPD:
Case-insensitive match: (i modifier after delimiter)
preg_match_all('/\[Hello\]/i', $word, $match);

PHP Pattern Modifier: $ for End-of-Lines in Multi-Line Strings

Note: See the bottom of this post for an explanation for why this wasn't originally working.
In PHP, I am attempting to match lower-case characters at the end of every line in a string buffer.
The regex pattern should be [a-z]$. But that only matches the last letter of the string. I believe this a regex modifier issue; I have experimented with /s /m /D, but nothing appears to match as expected.
<?php
$pattern = '/[a-z]$/';
$string = "this
is
a
broken
sentence";
preg_match_all($pattern, $string, $matches);
print_r($matches);
?>
Here's the output:
Array
(
[0] => Array
(
[0] => e
)
)
Here's what I expect the output to be:
Array (
[0] => Array (
[0] => s
[1] => s
[2] => a
[3] => n
[4] => e
)
)
Any advice?
Update: The PHP source code was written on a Windows machine; text editors in Windows, by convention, represent newlines differently than text editors on Unix system.
It appears that the byte-code representation of Windows text files (inheriting from DOS) was not respected by the PHP regex engine. Converting the end-of-line byte-code format to Unix solved the original problem.
Adam Wagner (see below) has posted a pattern that matches regardless of end-of-line byte-representation.
zerkms has the canonical regular expression, to which I am awarding the answer.
$pattern = '/[a-z]$/m';
$string = "this
is
a
broken
sentence";
preg_match_all($pattern, $string, $matches);
print_r($matches);
http://ideone.com/XkeD2
This will return exactly what you want
As #Will points out, it appears you either want the first char of each string, or your example is wrong. If you want the last char of each line (only if it's a lower-case char) you could try this:
/[a-z](?:\n)|[a-z]$/
The first segment [a-z](?:\n), checks to for lowercase chars before newlines. Then [a-z]$ get the last char of the string (in-case it's not followed by a newline.
With your example string, the output is:
Array
(
[0] => Array
(
[0] => s
[1] => a
[2] => n
[3] => e
)
)
Note - The 's' from 'is' is not present because it is followed by a space. To capture this 's' as well (ignoring trailing spaces), you can update the regex to: /[a-z](?:[ ]*\n)|[a-z](?:[ ]*)$/, which checks for 0 or more spaces immediately before the newline (or end of string). Which outputs:
Array
(
[0] => Array
(
[0] => s
[1] => s
[2] => a
[3] => n
[4] => e
)
)
Update
It appears the line-ending style wasn't liking your regex. To account for crazy line-endings (an other unsavory white-space at the end of the lines), you can use this (and still get the /m goodness).
/[a-z](?:\W*)$/m
It looks like you want to match before every newline, not at the end of the file. Perhaps you want
$pattern = '/[a-z]\n/';

Categories