Split string on spaces except words in quotes - php

I have a string like
$string = 'Some of "this string is" in quotes';
I want to get an array of all the words in the string which I can get by doing
$words = explode(' ', $string);
However I don't want to split up the words in quotes so ideally the end array will be
array ('Some', 'of', '"this string is"', 'in', 'quotes');
Does anyone know how I can do this?

You can use:
$string = 'Some of "this string is" in quotes';
$arr = preg_split('/("[^"]*")|\h+/', $string, -1,
PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
print_r ( $arr );
Output:
Array
(
[0] => Some
[1] => of
[2] => "this string is"
[3] => in
[4] => quotes
)
RegEx Breakup
("[^"]*") # match quoted text and group it so that it can be used in output using
# PREG_SPLIT_DELIM_CAPTURE option
| # regex alteration
\h+ # match 1 or more horizontal whitespace

Instead of doing it this way, you can do it in another way aka matching. It will be a lot more easier to match than to split.
So use the regex: /[^\s]+|".*?"/ in conjuction with preg_match_all.

You can get values by match, not by split, with regex:
/"[^"]+"|\w+/g
whis will match:
"[^"]+" - characters between quote signs ",
\w+ - sets of word characters (A-Za-z_0-9),
DEMO

I think you can use a regex like this:
/("[^"]*")|(\S+)/g
And you can use substitution $2
[Regex Demo]

Related

Regex for matching phrase between '' php

I want to match the content inside the ' and ' (single quotes). For example: 'for example' should return for and example. It's only a part of the sentence I have to analyze, I used preg_split(\s) for the whole sentence, so the 'for example' will become 'for and example'.
Right now I've tried /^'(.*)|(.*)'$/ and it only returns for but not the example, if I put it like /^(.*)'|'(.*)$/, it only returns example but not for. How should I fix this?
You can avoid double handling of the string by leveraging the \G metacharacter to continue matching an unlimited number of space-delimited strings inside of single quotes.
Code: (PHP Demo) (Regex Demo)
$string = "text 'for an example of the \"continue\" metacharacter' text";
var_export(preg_match_all("~(?|'|\G(?!^) )\K[^ ']+~", $string, $out) ? $out[0] : []);
Output:
array (
0 => 'for',
1 => 'an',
2 => 'example',
3 => 'of',
4 => 'the',
5 => '"continue"',
6 => 'metacharacter',
)
To get the single sentences (which you then want to split) you can use preg_match_all() to capture anything between two single quotes.
preg_match_all("~'([^']+)'~", $text, $matches)
$string = $matches[1];
$string now contains something like "example string with words".
Now if you want to split a string according to a specific sequence / character, you can make use of explode():
$string = "example string with words";
$result = explode(" ", $string);
print_r($result);
gives you:
Array
(
[0] => example
[1] => string
[2] => with
[3] => words
)

How can I split a string into an array of substrings with regular expressions in PHP?

I am trying to break a string of binary ones and zeros into groups of four. After reading the manual and the posts I am missing something:
$subject = "101010101";
$pattern = "/.{1,4}/";
$blocks = preg_split ($pattern, $subject);
print_r($blocks);
The result is an empty array.
Array
(
[0] =>
[1] =>
[2] =>
[3] =>
)
php >
You could just use str_split() which will split your string into an array of strings of size n.
$subject = "101010101";
$split = str_split($subject, 4);
print_r($split);
Output:
Array
(
[0] => 1010
[1] => 1010
[2] => 1
)
You get that result because you are matching 1 - 4 characters to split on. It will match all the characters in the string leaving nothing to display.
If you want to use a regex to break it up into groups of 4 (and the last one because there are 9 characters) you could use preg_match_all and match only 0 or 1 using a character class instead of using a dot which will match any character except a newline.
[01]{1,4}
Regex demo | Php demo
$subject = "101010101";
$pattern = "/[01]{1,4}/";
$blocks = preg_match_all ($pattern, $subject, $matches);
print_r($matches[0]);
Result
Array
(
[0] => 1010
[1] => 1010
[2] => 1
)
Any char in a string match your pattern, in other words, any string contains only delimiters. And result contains zero-sized spaces.
The get expected result you need capture only delimiters. You can do that adding two flags
$blocks = preg_split ($pattern, $subject, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY );
demo
You can set PREG_SPLIT_DELIM_CAPTURE flag to get captured pattern in output
If this flag is set, parenthesized expression in the delimiter pattern
will be captured and returned as well. PHP reference
Note:- You need to add the pattern into capturing group () to get it in ouput
$subject = "101010101";
$pattern = "/(.{1,4})/";
$blocks = preg_split ($pattern, $subject, null, PREG_SPLIT_DELIM_CAPTURE);
print_r($blocks);
preg_split() returns an array containing substrings of subject split along boundaries matched by pattern, or FALSE on failure.. But you are trying to grab 1-4 characters group from that string. So preg_match_all() can be used for this purpose. Example:
$subject = "101010101";
$pattern = "/[01]{1,4}/";
preg_match_all($pattern, $subject, $match);
echo '<pre>', print_r($match[0]);

Regular expression to extract a numeric value on a changing position within a variable string

How can I extract the bold numeric part of a string, when the most of the string can change? /data/ is always present and followed by the relevant, variable, numeric part (in this case 123456).
differentcontentLocationhttps://example.com/api/result/13548/data/123456differentstuffincludingwhitespacesandnewlines8484
$str = "differentcontentLocationhttps://example.com/api/result/13548/data/123456differentstuffincludingwhitespacesandnewlines8484";
$str2 = "differentcontentLocationhttps://example.com/api/result/13548/data/123456";
In this example I need 123456. The only constant parts in the string are /data/ and maybe the first part of the URL, like https://.
preg_match("#/data/([0-9]+)([^0-9]+)#siU", $str, $matches);
Results in Array ( [0] => /data/123456d [1] => 123456 [2] => d ), what would be acceptable. But if there's nothing following the relevant numeric part, like in $str2, this expression fails. I've tried to make the tailing part optional with preg_match("#/ads/([0-9]+)(([^0-9]+)?)#siU", $x, $matches);, but it fails, too; returning only the first number of the numeric part.
The U greediness swapping modifier makes all greedy subpattern lazy here, you should remove it together with ([^0-9]+). You also do not need DOTALL modifier because there is no . in your pattern whose behavior could be modified with that s flag.
preg_match("#/data/([0-9]+)#i", $str, $matches);
Now, the pattern will match:
/data/ - a sequence of literal chars
([0-9]+) - Group 1 capturing 1+ digits (same as (\d+))
See the PHP demo.
$str = "differentcontentLocationhttps://e...content-available-to-author-only...e.com/api/result/13548/data/123456differentstuffincludingwhitespacesandnewlines8484";
$str2 = "differentcontentLocationhttps://e...content-available-to-author-only...e.com/api/result/13548/data/123456";
preg_match("#/data/([0-9]+)#i", $str, $matches);
print_r($matches); // Array ( [0] => /data/123456 [1] => 123456 )
preg_match("#/data/([0-9]+)#i", $str2, $matches2);
print_r($matches2); // Array ( [0] => /data/123456 [1] => 123456 )

How to split a string into an array using a given regex expression

I am trying to explode / preg_split a string so that I get an array of all the values that are enclosed in ( ). I've tried the following code but I always get an empty array, I have tried many things but I cant seem to do it right
Could anyone spot what am I missing to get my desired output?
$pattern = "/^\(.*\)$/";
$string = "(y3,x3),(r4,t4)";
$output = preg_split($pattern, $string);
print_r($output);
Current output Array ( [0] => [1] => )
Desired output Array ( [0] => "(y3,x3)," [1] => "(r4,t4)" )
With preg_split() your regex should be matching the delimiters within the string to split the string into an array. Your regex is currently matching the values, and for that, you can use preg_match_all(), like so:
$pattern = "/\(.*?\)/";
$string = "(y3,x3),(r4,t4)";
preg_match_all($pattern, $string, $output);
print_r($output[0]);
This outputs:
Array
(
[0] => (y3,x3)
[1] => (r4,t4)
)
If you want to use preg_split(), you would want to match the , between ),(, but without consuming the parenthesis, like so:
$pattern = "/(?<=\)),(?=\()/";
$string = "(y3,x3),(r4,t4)";
$output = preg_split($pattern, $string);
print_r($output);
This uses a positive lookbehind and positive lookahead to find the , between the two parenthesis groups, and split on them. It also output the same as the above.
You can use a simple regex like \B,\B to split the string and improve the performance by avoiding lookahead or lookbehind regex.
\B is a non-word boundary so it will match only the , between ) and (
Here is a working example:
http://regex101.com/r/cV7bO7/1
$pattern = "/\B,\B/";
$string = "(y3,x3),(r4,t4),(r5,t5)";
$result = preg_split($pattern, $string);
$result will contain:
Array
(
[0] => (y3,x3)
[1] => (r4,t4)
[2] => (r5,t5)
)

Intelligent split of string into an array

This code will split the string into an array that contains test and string:
$str = 'test string';
$arr = preg_split('/\s+/', $str);
But I also want to detect quotes and ignore the text between them when splitting, for example:
$str = 'test "Two words"';
This should also return an array with two elements, test and Two words.
And another form, if possible:
$str = 'test=Two Words';
So if the equal sign is present before any spaces, the string should be split by =, otherwise the other rules from above should apply.
So how can I do this with preg_split?
Try str_getcsv:
print_r(str_getcsv('test string'," "));
print_r(str_getcsv('test "Two words"'," "));
print_r(str_getcsv('test=Two Words',"="));
Outputs
Array
(
[0] => test
[1] => string
)
Array
(
[0] => test
[1] => Two words
)
Array
(
[0] => test
[1] => Two Words
)
You can use something like preg_match to check if there's an equal sign exist before space and then determine what delimiter to use.
Works only in PHP>=5.3 though.
I'm sure this could be done with regex, but how about just splitting the string by quotation marks, then by spaces, using explode?
Given the string 'I am a string "with an embedded" string', you could first split by quotation marks, giving you ['I am a string', 'with an embedded', 'string'], then you go over every other element in the array and split by spaces, resulting in ['I', 'am', 'a', 'string', 'with an embedded', 'string'].
The exact code to do this you can probably write yourself. If not, let me know and I'll help you.
In your last example, just split by the equals symbol:
$str = 'test=Two Words';
print explode('=', $str);

Categories