Split sentence into words - php

for example i have sentenes like this:
$text = "word, word w.d. word!..";
I need array like this
Array
(
[0] => word
[1] => word
[2] => w.d
[3] => word".
)
I am very new for regular expression..
Here is what I tried:
function divide_a_sentence_into_words($text){
return preg_split('/(?<=[\s])(?<!f\s)\s+/ix', $text, -1, PREG_SPLIT_NO_EMPTY);
}
this
$text = "word word, w.d. word!..";
$split = preg_split("/[^\w]*([\s]+[^\w]*|$)/", $text, -1, PREG_SPLIT_NO_EMPTY);
print_r($split);
works, but i have second question i want to write list in mu regular exppression
"w.d" is special case.. for example this words is my list "w.d" , "mr.", "dr."
if i will take text:
$text = "word, dr. word w.d. word!..";
i need array:
Array (
[0] => word
[1] => dr.
[2] => word
[3] => w.d
[4] => word
)
sorry for bad english...

Using preg_split with a regex of /[^\w]*([\s]+[^\w]*|$)/ should work fine:
<?php
$text = "word word w.d. word!..";
$split = preg_split("/[^\w]*([\s]+[^\w]*|$)/", $text, -1, PREG_SPLIT_NO_EMPTY);
print_r($split);
?>
DEMO
Output:
Array
(
[0] => word
[1] => word
[2] => w.d
[3] => word
)

Use the function explode, that will split the string into an array
$words = explode(" ", $text);

use
str_word_count ( string $string [, int $format = 0 [, string $charlist ]] )
see here http://php.net/manual/en/function.str-word-count.php
it does exactly what you want. So in your case :
$myarray = str_word_count ($text,1);

Related

Can split the single string by explode?

I looked up splitting the string into array in google.I have found that str_split is working.By explode it's doesn't work in below condition.How can I split the string by explode()?
<?php
$string = "EEEE";
print_r(str_split($string));//Array ( [0] => E [1] => E [2] => E [3] => E )
print_r(explode("",$string));//Empty delimiter error
?>
As indicated by your error, explode requires a delimiter to split the string!
You should try,
$str = "EEEE";
$answer = preg_split('//', $str, -1, PREG_SPLIT_NO_EMPTY);
alternative way would be preg_split.

Regexp lookahead and lookbehind and match between certain characters

Currently I have this regexp to detect strings between double curly brackets and it work's wonderfully.
$str = "{{test}} and {{test2}}";
preg_match_all('/(?<={{)[^}]*(?=}})/', $str, $matches);
print_r($matches);
Returns:
Array
(
[0] => Array
(
[0] => test
[1] => test2
)
)
Now I need to expand it to only match stuff between ]] and [[
$str = "{{dont match}}]]{{test}} and {{test2}}[[{{dont match}}";
I've been trying to modify the regex but the lookahead and lookbehind is making it too difficult for me. How can I get it to match stuff inside ]] and [[ only?
Also I would like to match the whole string between ]] and [[ and then I would like to match each individual string between {{ }} inside it.
For example:
$str = "{{dont match}}]]{{test}} and {{test2}}[[{{dont match}}";
Would return:
Array
(
[0] => Array
(
[0] => {{test}} and {{test2}}
[1] => test
[2] => test2
)
)
Piggyback using preg_replace_callback:
$str = "{{dont match}}]]{{test}} and {{test2}}[[{{dont match}}";
$arr = array();
preg_replace_callback('/\]\](.*?)\[\[/', function($m) use (&$arr) {
preg_match_all('/(?<={{)[^}]*(?=}})/', $m[1], $arr); return true; }, $str);
print_r($arr[0]);
Output:
Array
(
[0] => test
[1] => test2
)

php regex split string by [%%%]

Hi I need a preg_split regex that will split a string at substrings in square brackets.
This example input:
$string = 'I have a string containing [substrings] in [brackets].';
should provide this array output:
[0]= 'I have a string containing '
[1]= '[substrings]'
[2]= ' in '
[3]= '[brackets]'
[4]= '.'
After reading your revised question:
This might be what you want:
$string = 'I have a string containing [substrings] in [brackets].';
preg_split('/(\[.*?\])/', $string, null, PREG_SPLIT_DELIM_CAPTURE);
You should get:
Array
(
[0] => I have a string containing
[1] => [substrings]
[2] => in
[3] => [brackets]
[4] => .
)
Original answer:
preg_split('/%+/i', 'ot limited to 3 %%% so it can be %%%% or % or %%%%%, etc Tha');
You should get:
Array
(
[0] => ot limited to 3
[1] => so it can be
[2] => or
[3] => or
[4] => , etc Tha
)
Or if you want a mimimum of 3 then try:
preg_split('/%%%+/i', 'Not limited to 3 %%% so it can be %%%% or % or %%%%%, etc Tha');
Have a go at http://regex.larsolavtorvik.com/
I think this is what you are looking for:
$array = preg_split('/(\[.*?\])/', $string, null, PREG_SPLIT_DELIM_CAPTURE);

preg_split() problem with strings containing '&'

I am using preg_split() to get array of sentence from a string.
$sentences = preg_split("/([.?!\r\n]+)/", $text, 0, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
But when $text contains '&', for example:
$text = 'this is test. we are testing this & we are over.';
then it stops matching after the '&'.
Your preg_split handles sentences with ampersands correctly, for example:
$text = 'Sample sentence. Another sentence! Sentence with the special character & (ampersand). Last sentence.';
$sentences = preg_split("/([.?!\r\n]+)/", $text, 0, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
print_r($sentences);
Output:
Array
(
[0] => Sample sentence
[1] => .
[2] => Another sentence
[3] => !
[4] => Sentence with the special character & (ampersand)
[5] => .
[6] => Last sentence
[7] => .
)
Your Script:
$text = 'this is test. we are testing this & we are over.';
$sentences = preg_split("/([.?!\r\n]+)/", $text, 0, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
echo '<pre>'.print_r($sentences, true).'</pre>';
My Output:
Array
(
[0] => this is test
[1] => .
[2] => we are testing this & we are over
[3] => .
)
I don't understand your problem.

How can I convert a sentence to an array of words?

From this string:
$input = "Some terms with spaces between";
how can I produce this array?
$output = ['Some', 'terms', 'with', 'spaces', 'between'];
You could use explode, split or preg_split.
explode uses a fixed string:
$parts = explode(' ', $string);
while split and preg_split use a regular expression:
$parts = split(' +', $string);
$parts = preg_split('/ +/', $string);
An example where the regular expression based splitting is useful:
$string = 'foo bar'; // multiple spaces
var_dump(explode(' ', $string));
var_dump(split(' +', $string));
var_dump(preg_split('/ +/', $string));
$parts = explode(" ", $str);
print_r(str_word_count("this is a sentence", 1));
Results in:
Array ( [0] => this [1] => is [2] => a [3] => sentence )
Just thought that it'd be worth mentioning that the regular expression Gumbo posted—although it will more than likely suffice for most—may not catch all cases of white-space. An example: Using the regular expression in the approved answer on the string below:
$sentence = "Hello my name is peter string splitter";
Provided me with the following output through print_r:
Array
(
[0] => Hello
[1] => my
[2] => name
[3] => is
[4] => peter
[5] => string
[6] => splitter
)
Where as, when using the following regular expression:
preg_split('/\s+/', $sentence);
Provided me with the following (desired) output:
Array
(
[0] => Hello
[1] => my
[2] => name
[3] => is
[4] => peter
[5] => string
[6] => splitter
)
Hope it helps anyone stuck at a similar hurdle and is confused as to why.
Just a question, but are you trying to make json out of the data? If so, then you might consider something like this:
return json_encode(explode(' ', $inputString));

Categories