This code will split the string into an array that contains test and string:
$str = 'test string';
$arr = preg_split('/\s+/', $str);
But I also want to detect quotes and ignore the text between them when splitting, for example:
$str = 'test "Two words"';
This should also return an array with two elements, test and Two words.
And another form, if possible:
$str = 'test=Two Words';
So if the equal sign is present before any spaces, the string should be split by =, otherwise the other rules from above should apply.
So how can I do this with preg_split?
Try str_getcsv:
print_r(str_getcsv('test string'," "));
print_r(str_getcsv('test "Two words"'," "));
print_r(str_getcsv('test=Two Words',"="));
Outputs
Array
(
[0] => test
[1] => string
)
Array
(
[0] => test
[1] => Two words
)
Array
(
[0] => test
[1] => Two Words
)
You can use something like preg_match to check if there's an equal sign exist before space and then determine what delimiter to use.
Works only in PHP>=5.3 though.
I'm sure this could be done with regex, but how about just splitting the string by quotation marks, then by spaces, using explode?
Given the string 'I am a string "with an embedded" string', you could first split by quotation marks, giving you ['I am a string', 'with an embedded', 'string'], then you go over every other element in the array and split by spaces, resulting in ['I', 'am', 'a', 'string', 'with an embedded', 'string'].
The exact code to do this you can probably write yourself. If not, let me know and I'll help you.
In your last example, just split by the equals symbol:
$str = 'test=Two Words';
print explode('=', $str);
Related
I want to match the content inside the ' and ' (single quotes). For example: 'for example' should return for and example. It's only a part of the sentence I have to analyze, I used preg_split(\s) for the whole sentence, so the 'for example' will become 'for and example'.
Right now I've tried /^'(.*)|(.*)'$/ and it only returns for but not the example, if I put it like /^(.*)'|'(.*)$/, it only returns example but not for. How should I fix this?
You can avoid double handling of the string by leveraging the \G metacharacter to continue matching an unlimited number of space-delimited strings inside of single quotes.
Code: (PHP Demo) (Regex Demo)
$string = "text 'for an example of the \"continue\" metacharacter' text";
var_export(preg_match_all("~(?|'|\G(?!^) )\K[^ ']+~", $string, $out) ? $out[0] : []);
Output:
array (
0 => 'for',
1 => 'an',
2 => 'example',
3 => 'of',
4 => 'the',
5 => '"continue"',
6 => 'metacharacter',
)
To get the single sentences (which you then want to split) you can use preg_match_all() to capture anything between two single quotes.
preg_match_all("~'([^']+)'~", $text, $matches)
$string = $matches[1];
$string now contains something like "example string with words".
Now if you want to split a string according to a specific sequence / character, you can make use of explode():
$string = "example string with words";
$result = explode(" ", $string);
print_r($result);
gives you:
Array
(
[0] => example
[1] => string
[2] => with
[3] => words
)
Currently I am trying to split the � the special character which represents %A0 at the URL. However when I use another URL, it doesn't recognize %A0 therefore I need to use %20 which is the standard space.
My question is. Is there a way to explode() special character �? Whenever I try to explode, it always return a single index array with length 1 array.
//Tried str_replace() to replace %A0 to empty string. Didn't work
$a = str_replace("%A0"," ", $_GET['view']);
// Tried to explode() but still returning single element
$b = explode("�", $a);
// Returning Array[0] => "Hello World" insteand of
// Array[2] => [0] => "Hello", [1] => "World"
echo $b[0];
Take a look at mb_split:
array mb_split ( string $pattern , string $string [, int $limit = -1 ] )
Split a multibyte string using regular expression pattern and returns
the result as an array.
Like this:
$string = "a�b�k�e";
$chunks = mb_split("�", $string);
print_r($chunks);
Outputs:
Array
(
[0] => a
[1] => b
[2] => k
[3] => e
)
I'm trying to split a string of sentences by "." to get each sentence in an array. Like below:
$Text = "Hello, Mr. James. How are you today."
$split= explode(".", $Text);
As you can see $Text contains 2 sentences therefore i should only have 2 elements in the array. The issue i'm having is that sometimes my $Text can contain words like "Mr." or any other word which contains a "." in the middle of a sentence. This will result in the sentences being split from the middle and placed separately in the array like below:
Array ( [0] => Hello, Mr [1] => James [2] => How are you today [3] => )
You can avoid a lot of exception handling and general misery, if you can ensure that all English sentences are properly spaced at the end of each sentence -- 2 consecutive spaces. This can be difficult when dealing with some digitized strings because sometimes multi-spacing gets condensed to a single space.
This is what I mean:
$Text = "Hello, Mr. James. How are you today.";
$split = explode(" ", $Text);
var_export($split);
// array ( 0 => 'Hello, Mr. James.', 1 => 'How are you today.', )
Exploding on each space-space will give you a reliable result.
If you want good output, you'll need to use good input.
If you want to blacklist a few predictable substrings that should not be use to split the string, then you can use (*SKIP)(*FAIL) for that.
Code: (Demo)
$text = "Hello, Mr. James. How are you today.";
var_export(
preg_split('~(?:Mrs?|Miss|Ms|Prof|Rev|Col|Dr)[.?!:](*SKIP)(*F)|[.?!:]+\K\s+~', $text, 0, PREG_SPLIT_NO_EMPTY)
);
Output:
array (
0 => 'Hello, Mr. James.',
1 => 'How are you today.',
)
I would like to remove substrings from a string that have delimiters.
Example:
$string = "Hi, I want to buy an [apple] and a [banana].";
How do I get "apple" and "banana" out of this string and in an array? And the other parts of the string "Hi, I want to buy an" and "and a" in another array.
I apologize if this question has already been answered. I searched this site and couldn't find anything that would help me. Every situation was just a little different.
You could use preg_split() thus:
<?php
$pattern = '/[\[\]]/'; // Split on either [ or ]
$string = "Hi, I want to buy an [apple] and a [banana].";
echo print_r(preg_split($pattern, $string), true);
which outputs:
Array
(
[0] => Hi, I want to buy an
[1] => apple
[2] => and a
[3] => banana
[4] => .
)
You can trim the whitespace if you like and/or ignore the final fullstop.
preg_match_all('(?<=\[)([a-z])*(?=\])', $string, $matches);
Should do what you want. $matches will be an array with each match.
I assume you want words as values in the array:
$words = explode(' ', $string);
$result = preg_grep('/\[[^\]]+\]/', $words);
$others = array_diff($words, $result);
Create an array of words using explode() on a space
Use a regex to find [somethings] using preg_grep()
Find the difference of all words and [somethings] using array_diff(), which will be the "other" parts of the string
I have a string like
$string = 'Some of "this string is" in quotes';
I want to get an array of all the words in the string which I can get by doing
$words = explode(' ', $string);
However I don't want to split up the words in quotes so ideally the end array will be
array ('Some', 'of', '"this string is"', 'in', 'quotes');
Does anyone know how I can do this?
You can use:
$string = 'Some of "this string is" in quotes';
$arr = preg_split('/("[^"]*")|\h+/', $string, -1,
PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
print_r ( $arr );
Output:
Array
(
[0] => Some
[1] => of
[2] => "this string is"
[3] => in
[4] => quotes
)
RegEx Breakup
("[^"]*") # match quoted text and group it so that it can be used in output using
# PREG_SPLIT_DELIM_CAPTURE option
| # regex alteration
\h+ # match 1 or more horizontal whitespace
Instead of doing it this way, you can do it in another way aka matching. It will be a lot more easier to match than to split.
So use the regex: /[^\s]+|".*?"/ in conjuction with preg_match_all.
You can get values by match, not by split, with regex:
/"[^"]+"|\w+/g
whis will match:
"[^"]+" - characters between quote signs ",
\w+ - sets of word characters (A-Za-z_0-9),
DEMO
I think you can use a regex like this:
/("[^"]*")|(\S+)/g
And you can use substitution $2
[Regex Demo]