preg_split() problem with strings containing '&' - php

I am using preg_split() to get array of sentence from a string.
$sentences = preg_split("/([.?!\r\n]+)/", $text, 0, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
But when $text contains '&', for example:
$text = 'this is test. we are testing this & we are over.';
then it stops matching after the '&'.

Your preg_split handles sentences with ampersands correctly, for example:
$text = 'Sample sentence. Another sentence! Sentence with the special character & (ampersand). Last sentence.';
$sentences = preg_split("/([.?!\r\n]+)/", $text, 0, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
print_r($sentences);
Output:
Array
(
[0] => Sample sentence
[1] => .
[2] => Another sentence
[3] => !
[4] => Sentence with the special character & (ampersand)
[5] => .
[6] => Last sentence
[7] => .
)

Your Script:
$text = 'this is test. we are testing this & we are over.';
$sentences = preg_split("/([.?!\r\n]+)/", $text, 0, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
echo '<pre>'.print_r($sentences, true).'</pre>';
My Output:
Array
(
[0] => this is test
[1] => .
[2] => we are testing this & we are over
[3] => .
)
I don't understand your problem.

Related

Whitespace delimiter not being captured in preg split

<?php
$text = "Testing text splitting\nWith a newline!";
$textArray = preg_split('/\s+/', $text, 0, PREG_SPLIT_DELIM_CAPTURE);
print_r($textArray);
The above code will output the following:
Array
(
[0] => Testing
[1] => text
[2] => splitting
[3] => With
[4] => a
[5] => newline!
)
However to my knowledge the PREG_SPLIT_DELIM_CAPTURE flag should be capturing the whitespace delimiters in the array. Am I missing something?
edit: Ok, after rereading the documentation I now understand PREG_SPLIT_DELIM_CAPTURE is not meant for this case. My desired output would be something like:
Array
(
[0] => Testing
[1] => ' '
[2] => text
[3] => ' '
[4] => splitting
[5] => '\n'
[6] => With
[7] => ' '
[8] => a
[9] => ' '
[10] => newline!
)
So if you read manual for PREG_SPLIT_DELIM_CAPTURE once again which says:
If this flag is set, parenthesized expression in the delimiter pattern will be captured and returned as well.
you will suddenly understand that expression in the delimiter pattern (in your case it is \s) will be captured (i.e added to result) only when it is in parentheses. Now, you can:
$text = "Testing text splitting\nWith a newline!";
$textArray = preg_split('/(\s+)/', $text, 0, PREG_SPLIT_DELIM_CAPTURE);
// parentheses!
print_r($textArray);
You can also use T-Regx library:
$textArray = pattern('(\s+)')->split("Testing text splitting\nWith a newline!")->inc();

Format Phone Area Code with PHP

We have lots of office locations on our website, each with a main phone number - many have been entered with the area code looking like this: (800) 555-5555
This is how I need them to all look, regardless to how it was entered: 800- 555-5555
Here's where I'm at right now
str_replace(array( '(', ')' ), '', $this->data['location_phone']);
While this removes both parenthesis, I really just need to remove the opening one and replace the closing parenthesis with a dash.
Use an array for your replacements.
str_replace(array( '(', ')' ), array('', '-'), $this->data['location_phone']);
You can read more on the str_replace documentation page.
You could do something similar to what you're already doing.. instead of replacing both ( and ) with '', you could just replace ( and then replace ) separately with -.
str_replace(array('('), '', $this->data['location_phone']);
str_replace(array(')'), '-', $this->data['location_phone']);
Or even better, combine into a single line (as indicated in other answers):
str_replace(array( '(', ')' ), array('', '-'), $this->data['location_phone']);
This answer seems to address the issue with preg_match & data reconstruction. But the regex posted in that answer is not that great for the kind of data cleanup described here.
So try this variation of that answer I put together which uses some great regex from a post in the official PHP documentation:
// Set test data.
$test_data = array();
$test_data[] = '1 800 555-5555';
$test_data[] = '1-800-555-5555';
$test_data[] = '800-555-5555';
$test_data[] = '(800) 555-5555';
// Set the regex.
$regex = '/^(?:1(?:[. -])?)?(?:\((?=\d{3}\)))?([2-9]\d{2})(?:(?<=\(\d{3})\))? ?(?:(?<=\d{3})[.-])?([2-9]\d{2})[. -]?(\d{4})(?: (?i:ext)\.? ?(\d{1,5}))?$/';
// Roll through the test data & process.
foreach ($test_data as $data) {
if (preg_match($regex, $data, $matches)) {
// Reconstruct the number based on the captured data.
echo "New number is: " . $matches[1] . '-' . $matches[2] . '-' . $matches[3] . '<br />';
// Dump the matches to check what is being captured.
echo '<pre>';
print_r($matches);
echo '</pre>';
}
}
And the cleaned results—including the preg_match matches—would be:
New number is: 800-555-5555
Array
(
[0] => 1 800 555-5555
[1] => 800
[2] => 555
[3] => 5555
)
New number is: 800-555-5555
Array
(
[0] => 1-800-555-5555
[1] => 800
[2] => 555
[3] => 5555
)
New number is: 800-555-5555
Array
(
[0] => 800-555-5555
[1] => 800
[2] => 555
[3] => 5555
)
New number is: 800-555-5555
Array
(
[0] => (800) 555-5555
[1] => 800
[2] => 555
[3] => 5555
)
Thanks. I used this for phone number to strip out all the spaces and ( ) and - so the tel://1234567890
href=tel://". str_replace(array( '(', ')',' ','-' ), array('', '','',''), $row["ContactPhone"]).">". $row["ContactPhone"]."

Split sentence into words

for example i have sentenes like this:
$text = "word, word w.d. word!..";
I need array like this
Array
(
[0] => word
[1] => word
[2] => w.d
[3] => word".
)
I am very new for regular expression..
Here is what I tried:
function divide_a_sentence_into_words($text){
return preg_split('/(?<=[\s])(?<!f\s)\s+/ix', $text, -1, PREG_SPLIT_NO_EMPTY);
}
this
$text = "word word, w.d. word!..";
$split = preg_split("/[^\w]*([\s]+[^\w]*|$)/", $text, -1, PREG_SPLIT_NO_EMPTY);
print_r($split);
works, but i have second question i want to write list in mu regular exppression
"w.d" is special case.. for example this words is my list "w.d" , "mr.", "dr."
if i will take text:
$text = "word, dr. word w.d. word!..";
i need array:
Array (
[0] => word
[1] => dr.
[2] => word
[3] => w.d
[4] => word
)
sorry for bad english...
Using preg_split with a regex of /[^\w]*([\s]+[^\w]*|$)/ should work fine:
<?php
$text = "word word w.d. word!..";
$split = preg_split("/[^\w]*([\s]+[^\w]*|$)/", $text, -1, PREG_SPLIT_NO_EMPTY);
print_r($split);
?>
DEMO
Output:
Array
(
[0] => word
[1] => word
[2] => w.d
[3] => word
)
Use the function explode, that will split the string into an array
$words = explode(" ", $text);
use
str_word_count ( string $string [, int $format = 0 [, string $charlist ]] )
see here http://php.net/manual/en/function.str-word-count.php
it does exactly what you want. So in your case :
$myarray = str_word_count ($text,1);

php regex split string by [%%%]

Hi I need a preg_split regex that will split a string at substrings in square brackets.
This example input:
$string = 'I have a string containing [substrings] in [brackets].';
should provide this array output:
[0]= 'I have a string containing '
[1]= '[substrings]'
[2]= ' in '
[3]= '[brackets]'
[4]= '.'
After reading your revised question:
This might be what you want:
$string = 'I have a string containing [substrings] in [brackets].';
preg_split('/(\[.*?\])/', $string, null, PREG_SPLIT_DELIM_CAPTURE);
You should get:
Array
(
[0] => I have a string containing
[1] => [substrings]
[2] => in
[3] => [brackets]
[4] => .
)
Original answer:
preg_split('/%+/i', 'ot limited to 3 %%% so it can be %%%% or % or %%%%%, etc Tha');
You should get:
Array
(
[0] => ot limited to 3
[1] => so it can be
[2] => or
[3] => or
[4] => , etc Tha
)
Or if you want a mimimum of 3 then try:
preg_split('/%%%+/i', 'Not limited to 3 %%% so it can be %%%% or % or %%%%%, etc Tha');
Have a go at http://regex.larsolavtorvik.com/
I think this is what you are looking for:
$array = preg_split('/(\[.*?\])/', $string, null, PREG_SPLIT_DELIM_CAPTURE);

How can I convert a sentence to an array of words?

From this string:
$input = "Some terms with spaces between";
how can I produce this array?
$output = ['Some', 'terms', 'with', 'spaces', 'between'];
You could use explode, split or preg_split.
explode uses a fixed string:
$parts = explode(' ', $string);
while split and preg_split use a regular expression:
$parts = split(' +', $string);
$parts = preg_split('/ +/', $string);
An example where the regular expression based splitting is useful:
$string = 'foo bar'; // multiple spaces
var_dump(explode(' ', $string));
var_dump(split(' +', $string));
var_dump(preg_split('/ +/', $string));
$parts = explode(" ", $str);
print_r(str_word_count("this is a sentence", 1));
Results in:
Array ( [0] => this [1] => is [2] => a [3] => sentence )
Just thought that it'd be worth mentioning that the regular expression Gumbo posted—although it will more than likely suffice for most—may not catch all cases of white-space. An example: Using the regular expression in the approved answer on the string below:
$sentence = "Hello my name is peter string splitter";
Provided me with the following output through print_r:
Array
(
[0] => Hello
[1] => my
[2] => name
[3] => is
[4] => peter
[5] => string
[6] => splitter
)
Where as, when using the following regular expression:
preg_split('/\s+/', $sentence);
Provided me with the following (desired) output:
Array
(
[0] => Hello
[1] => my
[2] => name
[3] => is
[4] => peter
[5] => string
[6] => splitter
)
Hope it helps anyone stuck at a similar hurdle and is confused as to why.
Just a question, but are you trying to make json out of the data? If so, then you might consider something like this:
return json_encode(explode(' ', $inputString));

Categories