PHP preg_split is returning empty strings - php

I am trying to split a string of combined lowercase letters into separate words with each first letter of the word being capitalized. I am trying to use PHP's preg_split(), but I'm not sure that I'm using it correctly, because the words aren't delimiters. the options for words are:
1. Burger
2. Fries
3. Chicken
4. Pizza
5. Sandwich
6. Onionrings
7. Milkshake
8. Coke
The below code returns blank array elements:
<?php
$input = 'milkshakepizzachickenfriescokeburgerpizzasandwichmilkshakepizza';
$split = preg_split("/(burger|fries|chicken|pizza|sandwich|onionrings|milkshake|coke)/", $input);
var_dump($split);
All the var_dumps and the echos are for debugging purposes only. The expected output is to have one long string with space-separated menu items. For example:
Burger Coke Fries

preg_split() will split the array by the value you're giving it, just like most split()-style functions. So, of course you get an array of blanks. If you split the string "-----" by the character -, for instance, then every character is counted as a delimiter and gets scooped out of the string.
What you want is preg_match_all().
preg_match_all — Perform a global regular expression match
Store the matches in some $matches variable as I do below...
$input = 'milkshakepizzachickenfriescokeburgerpizzasandwichmilkshakepizza';
$split = preg_match_all("/(burger|fries|chicken|pizza|sandwich|onionrings|milkshake|coke)/", $input, $matches);
print_r($matches);
Working Demo.
Results:
[0] => Array
(
[0] => milkshake
[1] => pizza
[2] => chicken
[3] => fries
[4] => coke
[5] => burger
[6] => pizza
[7] => sandwich
[8] => milkshake
[9] => pizza
)

try this
<?php
$input ="burger|fries|chicken|pizza|sandwich|onionrings|milkshake|coke";
$pattern = "/[|\s:]/";
$split = preg_split($pattern,$input);
print_r ($split);

You can capture your splitters, but the bits between the splits are empty, though it's possible to discard them.
<?php
$input = 'milkshakepizzachickenfriescokeburgerpizzasandwichmilkshakepizza';
$split = preg_split("/(burger|fries|chicken|pizza|sandwich|onionrings|milkshake|coke)/", $input, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print ucwords(implode(' ', $split));
Output:
Milkshake Pizza Chicken Fries Coke Burger Pizza Sandwich Milkshake Pizza

Related

Finding sentences between characters

I am trying to find sentences between pipe | and dot ., e.g.
| This is one. This is two.
The regex pattern I use :
preg_match_all('/(:\s|\|+)(.*?)(\.|!|\?)/s', $file0, $matches);
So far I could not manage to capture both sentences. The regex I use captures only the first sentence.
How can I solve this problem?
EDIT: as it may seen from the regex, I am trying to find the sentences BETWEEN (: or |) AND (. or ! or ?)
Column or pipe indicates starting point for sentences.
The sentences might be:
: Sentence one. Sentence two. Sentence three.
| Sentence one. Sentence two?
| Sentence one. Sentence two! Sentence three?
I would keep it simple and just match on:
\s*[^.|]+\s*
This says to match any content not consisting of pipes or full stops, and it also trims optional whitespace before/after each sentence.
$input = "| This is one. This is two.";
preg_match_all('/\s*[^.|]+\s*/s', $input, $matches);
print_r($matches[0]);
This prints:
Array
(
[0] => This is one
[1] => This is two
)
This does the job:
$str = '| This is one. This is two.';
preg_match_all('/(?:\s|\|)+(.*?)(?=[.!?])/', $str, $m);
print_r($m)
Output:
Array
(
[0] => Array
(
[0] => | This is one
[1] => This is two
)
[1] => Array
(
[0] => This is one
[1] => This is two
)
)
Demo & explanation
Another option is to make use of \G to get iterative matches asserting the position at the end of the previous match and capture the values in a capturing group matching a dot and 0+ horizontal whitespace chars after.
(?:\|\h*|\G(?!^))([^.\r\n]+)\.\h*
In parts
(?: Non capturing group
\|\h* Match | and 0+ horizontal whitespace chars
| Or
\G(?!^) Assert position at the end of previous match
) Close group
( Capture group 1
- [^.\r\n]+ Match 1+ times any char other than . or a newline
) Close group
\.\h* Match 1 . and 0+ horizontal whitespace chars
Regex demo | Php demo
For example
$re = '/(?:\|\h*|\G(?!^))([^.\r\n]+)\.\h*/';
$str = '| This is one. This is two.
John loves Mary.| This is one. This is two.';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
print_r($matches);
Output
Array
(
[0] => Array
(
[0] => | This is one.
[1] => This is one
)
[1] => Array
(
[0] => This is two
[1] => This is tw
)
)
To keep it simple, find everything between | and . and then split:
$input = "John loves Mary. | This is one. This is two. | Sentence 1. Sentence 2.";
preg_match_all('/\|\s*([^|]+)\./', $input, $matches);
if ($matches) {
foreach($matches[1] as $match) {
print_r(preg_split('/\.\s*/', $match));
}
}
Prints:
Array
(
[0] => This is one
[1] => This is two
)
Array
(
[0] => Sentence 1
[1] => Sentence 2
)

Split string after each number

I have a database full of strings that I'd like to split into an array. Each string contains a list of directions that begin with a letter (U, D, L, R for Up, Down, Left, Right) and a number to tell how far to go in that direction.
Here is an example of one string.
$string = "U29R45U2L5D2L16";
My desired result:
['U29', 'R45', 'U2', 'L5', 'D2', 'L16']
I thought I could just loop through the string, but I don't know how to tell if the number is one or more spaces in length.
You can use preg_split to break up the string, splitting on something which looks like a U,L,D or R followed by numbers and using the PREG_SPLIT_DELIM_CAPTURE to keep the split text:
$string = "U29R45U2L5D2L16";
print_r(preg_split('/([UDLR]\d+)/', $string, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY));
Output:
Array (
[0] => U29
[1] => R45
[2] => U2
[3] => L5
[4] => D2
[5] => L16
)
Demo on 3v4l.org
A regular expression should help you:
<?php
$string = "U29R45U2L5D2L16";
preg_match_all("/[A-Z]\d+/", $string, $matches);
var_dump($matches);
Because this task is about text extraction and not about text validation, you can merely split on the zer-width position after one or more digits. In other words, match one or more digits, then forget them with \K so that they are not consumed while splitting.
Code: (Demo)
$string = "U29R45U2L5D2L16";
var_export(
preg_split(
'/\d+\K/',
$string,
0,
PREG_SPLIT_NO_EMPTY
)
);
Output:
array (
0 => 'U29',
1 => 'R45',
2 => 'U2',
3 => 'L5',
4 => 'D2',
5 => 'L16',
)

REGEX Pattern for Validation that check all string is integer and split into single integers

I tried multiple time to make a pattern that can validate given string is natural number and split into single number.
..and lack of understanding of regex, the closest thing that I can imagine is..
^([1-9])([0-9])*$ or ^([1-9])([0-9])([0-9])*$ something like that...
It only generates first, last, and second or last-second split-numbers.
I wonder what I need to know to solve this problem.. thanks
You may use a two step solution like
if (preg_match('~\A\d+\z~', $s)) { // if a string is all digits
print_r(str_split($s)); // Split it into chars
}
See a PHP demo.
A one step regex solution:
(?:\G(?!\A)|\A(?=\d+\z))\d
See the regex demo
Details
(?:\G(?!\A)|\A(?=\d+\z)) - either the end of the previous match (\G(?!\A)) or (|) the start of string (^) that is followed with 1 or more digits up to the end of the string ((?=\d+\z))
\d - a digit.
PHP demo:
$re = '/(?:\G(?!\A)|\A(?=\d+\z))\d/';
$str = '1234567890';
if (preg_match_all($re, $str, $matches)) {
print_r($matches[0]);
}
Output:
Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
[5] => 6
[6] => 7
[7] => 8
[8] => 9
[9] => 0
)

PHP regex to extract special string

I am trying to use regex to extract a certain syntax, in my case something like "10.100" or "20.111", in which 2 numbers are separated by dot(.) . So if I provide "a 10.100", it will extract 10.100 from the string. If I provide "a 10.100 20.101", it will extract 10.100 and 20.101.
Until now I have tried to use
preg_match('/^.*([0-9]{1,2})[^\.]([0-9]{1,4}).*$/', $message, $array);
but still no luck. Please provide any suggestion because I don't have strong regex knowledge. Thanks.
You may use
\b[0-9]{1,2}\.[0-9]{1,4}\b
See the regex demo.
Details:
\b - a leading word boundary
[0-9]{1,2} - 1 or 2 digits
\. - a dot
[0-9]{1,4} - 1 to 4 digits
\b - a trailing word boundary.
If you do not care about the whole word option, just remove \b. Also, to match just 1 or more digits, you may use + instead of the limiting quantifiers. So, perhaps
[0-9]+\.[0-9]+
will also work for you.
See a PHP demo:
$re = '/[0-9]+\.[0-9]+/';
$str = 'I am trying to use regex to extract a certain syntax, in my case something like "10.100" or "20.111", in which 2 numbers are separated by dot(.) . So if I provide "a 10.100", it will extract 10.100 from the string. If I provide "a 10.100 20.101", it will extract 10.100 and 20.101.';
preg_match_all($re, $str, $matches);
print_r($matches[0]);
Output:
Array
(
[0] => 10.100
[1] => 20.111
[2] => 10.100
[3] => 10.100
[4] => 10.100
[5] => 20.101
[6] => 10.100
[7] => 20.101
)
Regex: /\d+(?:\.\d+)/
1. \d+ for matching digits one or more.
2. (?:\.\d+) for matching digits followed by . like .1234
Try this code snippet here
<?php
ini_set('display_errors', 1);
$string='a 10.100 20.101';
preg_match_all('/\d+(?:\.\d+)/', $string, $array);
print_r($array);
Output:
Array
(
[0] => Array
(
[0] => 10.100
[1] => 20.101
)
)
$decimals = "10.5 100.50 10.250";
preg_match_all('/\b[\d]{2}\.\d+\b/', $decimals, $output);
print_r($output);
Output:
Array
(
[0] => 10.5
[1] => 10.250
)
Regex Demo | Php Demo

ignoring upper case words with explode() in PHP

I'm new to PHP and I'm trying to explode data in a text file and put it into an array, then a table. The data in the text file looks like this:
THE MAN IN THE HIGH CASTLE by Philip K. Dick published 1965 born 1922
Assume that you cannot alter the original data. If I write:
$dataArray = explode(" ",$book);
that works for most of the data, but but splits every word of the book title into a different element. Is there a way I can tell it not to split upper case words?
Instead of explode, you may want to try using preg_split for this. It splits strings using a regular expression:
$book = 'THE MAN IN THE HIGH CASTLE by Philip K. Dick published 1965 born 1922';
// Split on all-lowercase words
print_r(preg_split('/\b\s*[a-z]+\s*\b/', $book));
Output:
Array
(
[0] => THE MAN IN THE HIGH CASTLE
[1] => Philip K. Dick
[2] => 1965
[3] => 1922
)
$input = explode("by", $book);
$title = $input[0];
$stuff = $input[1];

Categories