How can I split a sentence into words and punctuation marks?

How can I split a sentence into words and punctuation marks? - php

For example, I want to split this sentence:
I am a sentence.
Into an array with 5 parts; I, am, a, sentence, and ..
I'm currently using preg_split after trying explode, but I can't seem to find something suitable.
This is what I've tried:
$sentence = explode(" ", $sentence);
/*
returns array(4) {
[0]=>
string(1) "I"
[1]=>
string(2) "am"
[2]=>
string(1) "a"
[3]=>
string(8) "sentence."
}
*/
And also this:
$sentence = preg_split("/[.?!\s]/", $sentence);
/*
returns array(5) {
[0]=>
string(1) "I"
[1]=>
string(2) "am"
[2]=>
string(1) "a"
[3]=>
string(8) "sentence"
[4]=>
string(0) ""
}
*/
How can this be done?

You can split on word boundaries:
$sentence = preg_split("/(?<=\w)\b\s*/", 'I am a sentence.');
Pretty much the regex scans until a word character is found, then after it, the regex must capture a word boundary and some optional space.
Output:
array(5) {
[0]=>
string(1) "I"
[1]=>
string(2) "am"
[2]=>
string(1) "a"
[3]=>
string(8) "sentence"
[4]=>
string(1) "."
}

I was looking for the same solution and landed here. The accepted solution does not work with non-word characters like apostrophes and accent marks and so forth. Below, find the solution that worked for me.
Here is my test sentence:
Claire’s favorite sonata for piano is Mozart’s Sonata no. 15 in C Major.
The accepted answer gave me the following results:
Array
(
[0] => Claire
[1] => ’s
[2] => favorite
[3] => sonata
[4] => for
[5] => piano
[6] => is
[7] => Mozart
[8] => ’s
[9] => Sonata
[10] => no
[11] => . 15
[12] => in
[13] => C
[14] => Major
[15] => .
)
The solution I came up with follows:
$parts = preg_split("/\s+|\b(?=[!\?\.])(?!\.\s+)/", $sentence);
It gives the following results:
Array
(
[0] => Claire’s
[1] => favorite
[2] => sonata
[3] => for
[4] => piano
[5] => is
[6] => Mozart’s
[7] => Sonata
[8] => no.
[9] => 15
[10] => in
[11] => C
[12] => Major
[13] => .
)

If anyone is interested in an simple solution which ignores punctuation
preg_split( '/[^a-zA-Z0-9]+/', 'I am a sentence' );
would split into
array(4) {
[0]=>
string(1) "I"
[1]=>
string(2) "am"
[2]=>
string(1) "a"
[3]=>
string(8) "sentence"
}
Or an alternative solution where the punctuation is included in the adjacent word
preg_split( '/\b[^a-zA-Z0-9]+\b/', 'I am a sentence.' );
would split into
array(4) {
[0]=>
string(1) "I"
[1]=>
string(2) "am"
[2]=>
string(1) "a"
[3]=>
string(8) "sentence."
}

Related

Split lines of space-delimited values to get flat array of 2nd and 3rd column values

I have the following data which is displaying as this
{123456 123456 123456}
{654321 654321 654321}
{123456 123456 123456}
My PHP Code:
$myarray = preg_split("/(\s|{\s)/", $data);
print_r($myarray);
The output of my array is like this:
[0] => {123456
[1] => 123456
[2] => 123456}
[3] => {654321
[4] => 654321
[5] => 654321}
[6] => {123456
[7] => 123456
[8] => 123456}
My question is, how to hide [0], [3] and [6] from the output? if you noticed, they start with a {
I'm not sure if I did a mistake coding the preg_split function
Desired behavior:
if the data is like this
{1 2 3}
{4 5 6}
{7 8 9}
the desired output should be like this:
[0] => 2
[1] => 3
[2] => 5
[3] => 6
[4] => 8
[5] => 9

Not everything needs a regular expression.
$input = <<<_E_
{123456 123456 123456}
{654321 654321 654321}
{123456 123456 123456}
_E_;
$lines = explode("\n", $input);
$lines = array_map(function($a){return trim($a, '{}');}, $lines);
$lines = array_map(function($a){return explode(' ', $a);}, $lines);
$lines = array_map('array_filter', $lines);
$items = array_merge(...$lines);
var_dump($lines, $items);
Output:
array(3) {
[0]=>
array(3) {
[0]=>
string(6) "123456"
[2]=>
string(6) "123456"
[4]=>
string(6) "123456"
}
[1]=>
array(3) {
[0]=>
string(6) "654321"
[2]=>
string(6) "654321"
[4]=>
string(6) "654321"
}
[2]=>
array(3) {
[0]=>
string(6) "123456"
[2]=>
string(6) "123456"
[4]=>
string(6) "123456"
}
}
array(9) {
[0]=>
string(6) "123456"
[1]=>
string(6) "123456"
[2]=>
string(6) "123456"
[3]=>
string(6) "654321"
[4]=>
string(6) "654321"
[5]=>
string(6) "654321"
[6]=>
string(6) "123456"
[7]=>
string(6) "123456"
[8]=>
string(6) "123456"
}

Change {\s to {\d+\s so that {123456 will be a delimiter and not be included in the result.
You don't need the capture group around the regular expression.
$data = '{123456 123456 123456}
{654321 654321 654321}
{123456 123456 123456}';
$myarray = preg_split("/\s|{\d+\s/", $data);
print_r($myarray);
Output:
Array
(
[0] =>
[1] =>
[2] => 123456
[3] =>
[4] => 123456}
[5] =>
[6] =>
[7] => 654321
[8] =>
[9] => 654321}
[10] =>
[11] =>
[12] => 123456
[13] =>
[14] => 123456}
)
If you also don't want } in the results, that needs to be in the regexp as well.
$myarray = preg_split("/\s+|\s*\{\d+\s*|\s*\}\s*/", $data);
You could also use a regular expression that matches a number unless it's preceded by {, using a negative lookbehind.
$data = '{1 2 3} {4 5 6} {7 8 9}';
preg_match_all('/(?<!{)\d+/', $data, $match);
$myarray = $match[0];
print_r($myarray);
Output:
Array
(
[0] => 2
[1] => 3
[2] => 5
[3] => 6
[4] => 8
[5] => 9
)

Not everything needs a regular expression, but sometimes it is the most direct tool for the job.
Split the string on all characters that you don't want to keep.
Code: (Demo)
var_export(
preg_split(
'/ +|}\R?|{\d+/',
$text,
0,
PREG_SPLIT_NO_EMPTY
)
);

Not sure if preg_split() is required, but you could do this with a match all and then gather what's found:
$foo = trim('
{555555 123456 123456}
{666666 654321 654321}
{777777 123456 123456}
');
$items = [];
if (preg_match_all('/^{[\d]{1,} ([\d]{1,}) ([\d]{1,})}$/m', $foo, $matches)) {
$items = array_reduce(
array_slice($matches, 1),
fn(array $found, array $match) => array_merge($found, $match),
[]
);
}
var_dump($items);
Gives:
array(6) {
[0]=>
string(6) "123456"
[1]=>
string(6) "654321"
[2]=>
string(6) "123456"
[3]=>
string(6) "123456"
[4]=>
string(6) "654321"
[5]=>
string(6) "123456"
}
https://3v4l.org/n9tRk

Can preg_split be used to split a string into an array while also removing any whitespace?

Is there some regular expression that will ignore all spaces while splitting on all other characters?
$phrase = 'asdf asdf';
$result = preg_split('//', $phrase, -1, PREG_SPLIT_NO_EMPTY);
array(9) {
[0]=>
string(1) "a"
[1]=>
string(1) "s"
[2]=>
string(1) "d"
[3]=>
string(1) "f"
[4]=>
string(1) " " // this should be excluded
[5]=>
string(1) "a"
[6]=>
string(1) "s"
[7]=>
string(1) "d"
[8]=>
string(1) "f"
}

If you plan to split a string into characters with a regex avoiding whitespaces in the result, it is safer to use a matching approach:
if (preg_match_all('~\X(?<!\s)~u', $s, $m)) {
print_r($m[0]);
}
The ~\X(?<!\s)~u expression matches any Unicode "grapheme" but not if this is a whitespace.
See PHP demo:
$s = "प्रमुख समाचार";
if (preg_match_all('~\X(?<!\s)~u', $s, $m)) {
print_r($m[0]);
} // => Array ( [0] => प् [1] => र [2] => मु [3] => ख [4] => स [5] => मा [6] => चा [7] => र )

Accessing php variable

I can ask for help.
When I print out the php array content:
var_dump($ rowData);
I get an extract:
Array ( [0] => )
Array ( )
Array ( [0] => 162 [1] => 238 [2] => 331 [3] => 102 [4] => 103 [5] => 101 [6] => 99 [7] => 102 [8] => 103 [9] => 46 )
Array ( [0] => 53 [1] => 63 [2] => 48 [3] => 70 [4] => 30 [5] => 63 [6] => 63 [7] => 50 [8] => [9] => 33 )
array(4) { [0]=> array(1) { [0]=> string(6) " " } [1]=> array(0) { } [2]=> array(10) { [0]=> string(3) "162" [1]=> string(3) "238" [2]=> string(3) "331" [3]=> string(3) "102" [4]=> string(3) "103" [5]=> string(3) "101" [6]=> string(2) "99" [7]=> string(3) "102" [8]=> string(3) "103" [9]=> string(2) "46" } [3]=> array(10) { [0]=> string(2) "53" [1]=> string(2) "63" [2]=> string(2) "48" [3]=> string(2) "70" [4]=> string(2) "30" [5]=> string(2) "63" [6]=> string(2) "63" [7]=> string(2) "50" [8]=> string(0) "" [9]=> string(2) "33" } }
How can I get into the variable: values 162 and 53

I'm not sure what you need the first two rows, I think you should do some validation to make sure empty values don't get entered into the array if you don't need them.
But if you really want to use that array, and you need to access those specific values, you need to access it by index, the below snippet makes an array similar to yours, and you can see how I accessed the values you asked about in your question.
<?php
//This is just to emulate the array you presented
$arr[][0] = null;
$arr[] = null;
$arr[][0] = 162;
$arr[][0] = 53;
var_dump($arr);
echo "first value you want: ".$arr[2][0];
echo "\nsecond value you want: ".$arr[3][0];
?>
here's a screenshot of the file running (also, if you want to see how to do something like this in a foreach loop and have a specific goal in mind with this, comment on my answer and I'll edit it).

need preg_match_all links

i have a string like this one:
$string = "some text
http://dvz.local/index/index/regionId/28
http://stuff.kiev.ua/roadmap_page.php http://192.168.3.192/roadmap_page.php
http://192.168.3.192/roadmap_page.php#qwe";
need to get all links.
i tried this way: /http:\/\/(.*)[|\s]?/
returns:
array(2) {
[0] =>
array(3) {
[0] =>
string(42) "http://dvz.local/index/index/regionId/28\r\n"
[1] =>
string(77) "http://stuff.kiev.ua/roadmap_page.php http://192.168.3.192/roadmap_page.php\r\n"
[2] =>
string(41) "http://192.168.3.192/roadmap_page.php#qwe"
}
[1] =>
array(3) {
[0] =>
string(34) "dvz.local/index/index/regionId/28\r"
[1] =>
string(69) "stuff.kiev.ua/roadmap_page.php http://192.168.3.192/roadmap_page.php\r"
[2] =>
string(34) "192.168.3.192/roadmap_page.php#qwe"
}
}
EDIT 1:
expect:
array(2) {
[0] =>
array(3) {
[0] =>
string(42) "http://dvz.local/index/index/regionId/28"
[1] =>
string(77) "http://stuff.kiev.ua/roadmap_page.php"
[2] =>
string(77) "http://192.168.3.192/roadmap_page.php"
[3] =>
string(41) "http://192.168.3.192/roadmap_page.php#qwe"
}
[1] =>
array(3) {
[0] =>
string(34) "dvz.local/index/index/regionId/28"
[1] =>
string(69) "stuff.kiev.ua/roadmap_page.php"
[2] =>
string(69) "192.168.3.192/roadmap_page.php"
[3] =>
string(34) "192.168.3.192/roadmap_page.php#qwe"
}
}

Try this one:
/http:\/\/([^\s]+)/

Try this:
preg_match_all('|http://([^\s]*)|', $string, $matches);
var_dump($matches);

All links from text
http[s]?[^\s]*

Numerous pages have only relative links to the main document, (thus no http(s):// ... to parse), for those the following works fine, splitting by the href attribute:
preg_match_all('|href="([^\s]*)"><\/a>|', $html, $output_array);
Or even simpler:
preg_match_all('|href="(.*?)"><\/a>|', $html, $output_array);
Example output:
[0]=>
string(56) "/broadcast/bla/xZr300"
[1]=>
string(50) "/broadcast/lol/fMoott"

PHP Count number of occurrences in numeric array

I have a PHP array and I have dumped it below using Zend_Debug:
$ids = array(13) {
[0] => string(1) "7"
[1] => string(1) "8"
[2] => string(1) "2"
[3] => string(1) "7"
[4] => string(1) "8"
[5] => string(1) "4"
[6] => string(1) "7"
[7] => string(1) "3"
[8] => string(1) "7"
[9] => string(1) "8"
[10] => string(1) "3"
[11] => string(1) "7"
[12] => string(1) "4"
}
I am trying to get how many times each number occurs in the array and output it into an array.
I have tried using array_count_values($ids) but it outputs in the order of most occurred but I cant get the Total times the numbers occur. It gives me the below output:
array(5) {
[7] => int(5)
[8] => int(3)
[2] => int(1)
[4] => int(2)
[3] => int(2)
}
I can see from the above array 7 occurs 5 times but I can access it when I loop through the array!
Any thoughts?
Cheers
J.

You can access the data you want like this:
$ids = array( ...);
$array = array_count_values( $ids);
foreach( $array as $number => $times_number_occurred) {
echo $number . ' occurred ' . $times_number_occurred . ' times!';
}
Output:
7 occurred 5 times!
8 occurred 3 times!
2 occurred 1 times!
4 occurred 2 times!
3 occurred 2 times!
Demo

Use a foreach construct to loop through the resulting array:
$res = array_count_values($ids);
foreach( $res as $value => $count ) {
// your code here
echo "The value ".$value." appeared ".$count." times in the array";
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How can I split a sentence into words and punctuation marks? - php

Related

Split lines of space-delimited values to get flat array of 2nd and 3rd column values

Can preg_split be used to split a string into an array while also removing any whitespace?

Accessing php variable

need preg_match_all links

PHP Count number of occurrences in numeric array

Categories

Resources