Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I'm looking for help writing a regular expression with PHP. Coming in I have the data as follows:
3 1/2 cups peeled and diced potatoes
1/3 cup diced celery
1/3 cup finely chopped onion
2 tablespoons chicken bouillon granules
I have this all in a single variable. I now am parsing it out so that it stores as 3 different usable data items.
I've not ever written a regular expression before, and I found this guide here - http://www.noupe.com/php/php-regular-expressions.html but I'm still struggling to take that and apply it to my situation. I also do not know how many rows will be coming in, it could be 1 or it could be 100.
This is what I have so far. I have tested the code around the preg_match statement and it's working.
preg_match_all("",
$post_meta,
$out, PREG_PATTERN_ORDER);
What should I put between the "" in the preg_match_all statement to achieve the desired parsing? Thanks upfront for any help you can give!
EDIT
the desired output for the example input would be:
$var1 = 3 1/2
$var2 = cups
$var3 = peeled and diced potatoes
so then I can run functions to store the data:
update_database($var1);
update_database($var2);
update_database($var3);
repeat for each row. It doesn't have to be 3 different variables, an array would be fine too.
You can break it apart with an expression like this:
$string = '3 1/2 cups peeled and diced potatoes
1/3 cup diced celery
1/3 cup finely chopped onion
2 tablespoons chicken bouillon granules';
preg_match_all('~([0-9 /]+)\s+(cup|tablespoon)s?\s+([-A-Z ]+)~i', $string, $matches);
That will give you this if you print $matches:
Array
(
[0] => Array
(
[0] => 3 1/2 cups peeled and diced potatoes
[1] => 1/3 cup diced celery
[2] => 1/3 cup finely chopped onion
[3] => 2 tablespoons chicken bouillon granules
)
[1] => Array
(
[0] => 3 1/2
[1] => 1/3
[2] => 1/3
[3] => 2
)
[2] => Array
(
[0] => cup
[1] => cup
[2] => cup
[3] => tablespoon
)
[3] => Array
(
[0] => peeled and diced potatoes
[1] => diced celery
[2] => finely chopped onion
[3] => chicken bouillon granules
)
)
Although this part isn't really necessary, you can restructure the array to put each item in the format you are asking for. (You can write to the db without putting them in this order, but I will demonstrate here how to put them into the order you are looking for.)
$info_array = array();
for ($i = 0; $i < count($matches); $i++) {
for ($j = 1; $j < count($matches[$i]); $j++) {
$info_array[$i][] = $matches[$j][$i];
}
}
If you printed $info_array, you'd see this:
Array
(
[0] => Array
(
[0] => 3 1/2
[1] => cup
[2] => peeled and diced potatoes
)
[1] => Array
(
[0] => 1/3
[1] => cup
[2] => diced celery
)
[2] => Array
(
[0] => 1/3
[1] => cup
[2] => finely chopped onion
)
[3] => Array
(
[0] => 2
[1] => tablespoon
[2] => chicken bouillon granules
)
)
You can now loop through that array to put the items into the database:
for ($i = 0; $i < count($info_array); $i++) {
foreach ($info_array[$i] AS $ingredient) {
// INSERT INTO DATABASE HERE
print "<BR>update_database(".$ingredient.")";
}
}
So that would do what you are asking, but I'm assuming that you have some columns that you want to assign these to. You can do something like this if you want to put each piece into its own column:
$info_array = array();
for ($i = 0; $i < count($matches); $i++) {
for ($j = 1; $j < count($matches[$i]); $j++) {
if ($j == 1) {$key = 'amount';}
elseif ($j == 2) {$key = 'size';}
elseif ($j == 3) {$key = 'ingredient';}
$info_array[$i][$key] = $matches[$j][$i];
}
}
print "<PRE><FONT COLOR=ORANGE>"; print_r($info_array); print "</FONT></PRE>";
for ($i = 0; $i < count($info_array); $i++) {
foreach ($info_array[$i] AS $ingredient) {
print "<BR>update_database(".$ingredient.")";
}
}
foreach ($info_array AS $ingredient_set) {
$sql = "INSERT INTO table SET Amount = '".$ingredient_set['amount']."', Size = '".$ingredient_set['size']."', Ingredient = '".$ingredient_set['ingredient']."'";
print "<BR>".$sql;
}
That would give you something like this:
INSERT INTO table SET Amount = '3 1/2', Size = 'cup', Ingredient = 'peeled and diced potatoes'
INSERT INTO table SET Amount = '1/3', Size = 'cup', Ingredient = 'diced celery'
INSERT INTO table SET Amount = '1/3', Size = 'cup', Ingredient = 'finely chopped onion'
INSERT INTO table SET Amount = '2', Size = 'tablespoon', Ingredient = 'chicken bouillon granules'
EDIT: Explanation of the REGEX
([0-9 /]+) \s+ (cup|tablespoon)s? \s+ ([-A-Z ]+)
^ ^ ^ ^ ^
1 2 3 4 5
([0-9 /]+) Looking for a digit here to capture the amount of whatever measurement you will need. The [0-9] is a character class that means only grab numbers falling between 0 and 9. Also inside the character class, I added a space and a forward slash to accomodate measurements like 3 1/2. The + sign means that it has to have at least one of those to make the match. Finally, the parenthesis around this part tell PHP to capture the value and store it as part of the $matches array so we can do something with it later.
\s+ Look for a whitespace character. Because of the +, we need it to contain at least one, but could be more than one space. I changed this from my initial code just in case there was more than one space.
(cup|tablespoon)s? This is basically an "OR" statement. It's looking for either cup or tablespoon. It can also have an s after it as in cups or tablespoons, but the ? means that it doesn't have to be there. (The s can be there, but doesn't have to be.) In this "OR" statement, you would probably want to add other things like teaspoon|pint|quart|gallon|ounce|oz|box, etc. Each item separated by a | is just another thing that it could match. The parenthesis here will capture whatever it matched and store it so we can use it later.
\s+ Same as number 2.
([-A-Z ]+) The character class [A-Z] looks for any letters. Actually any UPPERCASE letters, but you'll notice that after the expression, I use the case-insensitive i flag. This makes it so that it will match uppercase or lowercase letters. Also to this class, I have added a few other characters: the - and a space. If you run into any other characters like that that make the match fail, you can just add those characters into the class. (For instance, you may have an in apostrophe as in 1 Box Sara Lee's Cake Mix. Just add in the apostrophe into that class after the space.) The + sign means to find at least one of those characters in that class and the parenthesis capture whatever it found and saves it so that we can work with it later.
Hopefully that helps!
How about:
preg_match_all("~^([\d/ ]+?)\s+(\w+)\s+(.+)$~",
$post_meta,
$out, PREG_PATTERN_ORDER);
You can try this:
preg_match_all('/([\d\s\/]+)\s+(\w+)\s+(.*)$/',
$post_meta,
$out, PREG_PATTERN_ORDER);
$var1 = $out[1][0];
$var2 = $out[2][0];
$var3 = $out[3][0];
This is wat you need to pass as pattern :
/([\d\s/]+)\s+(\w+)\s+(.*)$
Related
I am trying to split a string of combined lowercase letters into separate words with each first letter of the word being capitalized. I am trying to use PHP's preg_split(), but I'm not sure that I'm using it correctly, because the words aren't delimiters. the options for words are:
1. Burger
2. Fries
3. Chicken
4. Pizza
5. Sandwich
6. Onionrings
7. Milkshake
8. Coke
The below code returns blank array elements:
<?php
$input = 'milkshakepizzachickenfriescokeburgerpizzasandwichmilkshakepizza';
$split = preg_split("/(burger|fries|chicken|pizza|sandwich|onionrings|milkshake|coke)/", $input);
var_dump($split);
All the var_dumps and the echos are for debugging purposes only. The expected output is to have one long string with space-separated menu items. For example:
Burger Coke Fries
preg_split() will split the array by the value you're giving it, just like most split()-style functions. So, of course you get an array of blanks. If you split the string "-----" by the character -, for instance, then every character is counted as a delimiter and gets scooped out of the string.
What you want is preg_match_all().
preg_match_all — Perform a global regular expression match
Store the matches in some $matches variable as I do below...
$input = 'milkshakepizzachickenfriescokeburgerpizzasandwichmilkshakepizza';
$split = preg_match_all("/(burger|fries|chicken|pizza|sandwich|onionrings|milkshake|coke)/", $input, $matches);
print_r($matches);
Working Demo.
Results:
[0] => Array
(
[0] => milkshake
[1] => pizza
[2] => chicken
[3] => fries
[4] => coke
[5] => burger
[6] => pizza
[7] => sandwich
[8] => milkshake
[9] => pizza
)
try this
<?php
$input ="burger|fries|chicken|pizza|sandwich|onionrings|milkshake|coke";
$pattern = "/[|\s:]/";
$split = preg_split($pattern,$input);
print_r ($split);
You can capture your splitters, but the bits between the splits are empty, though it's possible to discard them.
<?php
$input = 'milkshakepizzachickenfriescokeburgerpizzasandwichmilkshakepizza';
$split = preg_split("/(burger|fries|chicken|pizza|sandwich|onionrings|milkshake|coke)/", $input, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print ucwords(implode(' ', $split));
Output:
Milkshake Pizza Chicken Fries Coke Burger Pizza Sandwich Milkshake Pizza
I have this text:
A man’s jacket is of green color. He – the biggest star in modern history – rides bikes very fast (230 km per hour). How is it possible?! What kind of bike is he using? The semi-automatic gear of his bike, which is quite expensive, significantly helps to reach that speed. Some (or maybe many) claim that he is the fastest in the world! “I saw him ride the bike!” Mr. John Deer speaks. “The speed he sets is 133.78 kilometers per hour,” which sounds incredible; sounds deceiving.
I want to have the following resulting array:
words[1] = "A"
words[2] = "man's"
words[3] = "jacket"
...
words[n+1] = "color"
words[n+2] = "."
words[n+3] = "He"
words[n+4] = "-"
words[n+5] = "the"
...
This array should include all words and punctuation marks separately. Can that be performed using regexp? Can anyone help to compose it?
Thanks!
EDIT: based on request to show my work.
I'm processing the text using the following function, but I want to do the same in regex:
$text = explode(' ', $this->rawText);
$marks = Array('.', ',', ' ?', '!', ':', ';', '-', '--', '...');
for ($i = 0, $j = 0; $i < sizeof($text); $i++, $j++) {
$skip = false;
//check if the word contains punctuation mark
foreach ($marks as $value) {
$markPosition = strpos($text[$i], $value);
//if contains separate punctation mark from the word
if ($markPosition !== FALSE) {
//check position of punctation mark - if it's 0 then probably it's punctuation mark by itself like for example dash
if ($markPosition === 0) {
//add separate mark to array
$words[$j] = new Word($j, $text[$i], 2, $this->phpMorphy);
} else {
$words[$j] = new Word($j, substr($text[$i], 0, strlen($text[$i]) - 1), 0, $this->phpMorphy);
//add separate mark to array
$punctMark = substr($text[$i], -1);
$j += 1;
$words[$j] = new Word($j, $punctMark, 1, $this->phpMorphy);
}
$skip = true;
break;
}
}
if (!$skip) {
$words[$j] = new Word($j, $text[$i], 0, $this->phpMorphy);
}
}
The following will split on your specific text.
$words = preg_split('/(?<=\s)|(?<=\w)(?=[.,:;!?()-])|(?<=[.,!()?\x{201C}])(?=[^ ])/u', $text);
See working demo
Try making use of preg_split. Pass your punctuations(of your choice) inside the square brackets [ and ]
<?php
$str="A man’s jacket is of green color. He – the biggest star in modern history – rides bikes very fast (230 km per hour). How is it possible?! What kind of bike is he using? The semi-automatic gear of his bike, which is quite expensive, significantly helps to reach that speed. Some (or maybe many) claim that he is the fastest in the world! “I saw him ride the bike!” Mr. John Deer speaks. “The speed he sets is 133.78 kilometers per hour,” which sounds incredible; sounds deceiving.";
$keywords=preg_split("/[-,. ]/", $str);
print_r($keywords);
OUTPUT:
Array (
[0] => A
[1] => man’s
[2] => jacket
[3] => is
[4] => of
[5] => green
[6] => color
[7] =>
[8] => He
[9] => –
[10] => the
[11] => biggest
[12] => star
[13] => in
[14] => modern
[15] => history
[16] => –
Message truncated to prevent abuse of resources ... Shankar ;)
i'm working on a project that will need to have everything shown with barcodes, so I've generated 7 numbers for EAN8 algorithm and now have to get these 7 numbers seperately, right now i'm using for the generation
$codeint = mt_rand(1000000, 9999999);
and I need to get this 7 numbers each seperately so I can calculate the checksum for EAN8, how can i split this integer to 7 parts, for example
12345678 to
arr[0]=1
arr[1]=2
arr[2]=3
arr[3]=4
arr[4]=5
arr[5]=6
arr[6]=7
any help would be appreciated..
also I think that I'm becoming crazy :D because I already tried most of the solutions you gave me here before and something is not working like it should work, for example:
$codeint = mt_rand(1000000, 9999999);
echo $codeint."c</br>";
echo $codeint[1];
echo $codeint[2];
echo $codeint[3];
gives me :
9082573c
empty row
empty row
empty row
solved! $codeint = (string)(mt_rand(1000000, 9999999));
Try to use str_split() function:
$var = 1234567;
print_r(str_split($var));
Result:
Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
[5] => 6
[6] => 7
)
There are two ways to do this, one of which is reasonably unique to PHP:
1) In PHP, you can treat an integer value as a string and then index into the individual digits:
$digits = "$codeint";
// access a digit using intval($digits[3])
2) However, the much more elegant way is to use actual integer division and a little knowledge about mathematical identities of digits, namely in a number 123, each place value is composed of ascending powers of 10, i.e.: 1 * 10^2 + 2 * 10^1 + 3 * 10^0.
Consequently, dividing by powers of 10 will permit you to access each digit in turn.
it's basic math you can divide them in loop by 10
12345678 is 8*10^1 + 7*10^2 + 6*10^3...
the other option is cast it to char array and then just get it as char
Edit
After #HamZa DzCyberDeV suggestion
$string = '12345678';
echo "<pre>"; print_r (str_split($string));
But in mind it comes like below but your suggestion is better one.
If you're getting string from your function then you can use below one
$string = '12345678';
$arr = explode(",", chunk_split($string, 1, ','));
$len = count($arr);
unset($arr[$len-1]);
echo "<pre>";
print_r($arr);
and output is
Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
[5] => 6
[6] => 7
[7] => 8
)
okay what you can do is
Type cast to string with prefill 0
this is how it works
$sinteger = (string)$integer;
$arrsize = 0 ;
for (i=strlen($sinteger), i == 0 ; i--)
{
arr[$arrsize]=$sinteger[i];
$arrsize++;
}
And then what is left you can prefill with zip.
I am sure you can manage the order reverse or previous. but this is simple approach.
I have a list of courses and the hours they require for students to take them. The courses are as follows:
CON8101 Residential Building/Estimating 16 hrs/w
CON8411 Construction Materials I 4 hrs/w
CON8430 Computers and You 4 hrs/w
MAT8050 Geometry and Trigonometry 4 hrs/w
I have used this RegEx to extract the name of course and the hours each course takes each week. There are more than 4 courses, the 4 are examples above. There can be as many as 50 courses.
$courseHoursRegEx = "/\s[0-9]{1,2}\shrs/w/";
$courseNameRegEx = "/[a-zA-Z]{3}[0-9]{4}[A-Z]{0,1}\s?/[a-zA-Z]{3,40}/";
And applied the following function (not sure if 100% right) to extract the RegEx'd strings. Using $courseLine is the variable I saved the string of each line from a text document that early I have fopened. It keeps track of the total hours that has been extracted from the string.
$courses is an array of check boxes that the user enters in the html section
$totalHours += GetCourseHours($courseLine);
function GetCourseHours($couseLine)
{
if(!preg_match($courseHoursRegEx, $courseLine))
{
return $courseLine;
}
}
function GetCourseName($courseLine)
{
if(!preg_match($courseNameRegEx, $courseLine))
{
return $courseLine;
}
}
I used a foreach loop to output all the selected courses to be sorted out in a table.
foreach($courses as $course)
{
$theCourse = GetCourseName($course);
$theHours = GetCourseHours($course)
}
Edit: output code
for($i = 1; $i <= $courses; ++$i)
{
printf("<tr><td>\$%.2f</td><td>\$%.2f</td></tr>", $theCourse, $theHours);
}
I am not sure how to output what I have into a dynamic table organized by the course name, and hours for each course. I cannot get my page to run, I cannot find any syntax errors, I was afraid it was my logic.
First of all, (after fixing a few minor things within the regexes) you can do all of that in one preg_ call. Here is how:
preg_match_all("~([a-zA-Z]{3}\d{4}[A-Z]{0,1}\s.+)\s(\d{1,2})\shrs/w~", $str, $matches);
$str can either be a multiline string with all rows at once. Or you can pass in a single line at a time. If you pass in all lines at once, $matches will afterward look like this:
Array
(
[0] => Array
(
[0] => CON8101 Residential Building/Estimating 16 hrs/w
[1] => CON8411 Construction Materials I 4 hrs/w
[2] => CON8430 Computers and You 4 hrs/w
[3] => MAT8050 Geometry and Trigonometry 4 hrs/w
)
[1] => Array
(
[0] => CON8101 Residential Building/Estimating
[1] => CON8411 Construction Materials I
[2] => CON8430 Computers and You
[3] => MAT8050 Geometry and Trigonometry
)
[2] => Array
(
[0] => 16
[1] => 4
[2] => 4
[3] => 4
)
)
Now you can simply iterate over all names in $matches[1] and sum up the hours in $matches[2]. Notice that those two inner arrays correspond to what's inside of the round brackets I used in the regex. These are so called subpatterns, and they capture additional (sub-)matches. Also $matches[0] will always contain the full match of the whole pattern, but you don't need that in this case.
I am attempting to use RegEx to strip down the following data:
mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)&mlb_s_right1_1=W: Hughes L: Britton&mlb_s_right1_count=1&mlb_s_url1=http://sports.espn.go.com/mlb/boxscore?gameId=320801110&mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)&mlb_s_right2_1=W: Peavy L: Diamond S: Reed&mlb_s_right2_count=1&mlb_s_url2=http://sports.espn.go.com/mlb/boxscore?gameId=320801109
I am hoping to split it apart by home team (first city), home score (first digit), away team (second city), away score (second digit), and where in the game it is (in parenthesis). This is the RegEx I have currently, but am feeling is very wrong.
preg_match_all('/mlb_s_left[0-9]=(?P<hometeam>.*?) (?P<homescore>.*?) (?P<awayteam>.*?) (?P<awayscore>.*?)\((?P<time>.*?)\)/', $content, $matches);
I would appreciate any and all help in getting this working.
I have tested following code snippet in php 5.4.5:
<?php
$foo = 'mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)&mlb_s_right1_1=W: Hughes L: Britton&mlb_s_right1_count=1&mlb_s_url1=http://sports.espn.go.com/mlb/boxscore?gameId=320801110&mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)&mlb_s_right2_1=W: Peavy L: Diamond S: Reed&mlb_s_right2_count=1&mlb_s_url2=http://sports.espn.go.com/mlb/boxscore?gameId=320801109';
preg_match_all('/mlb_s_left\d=\^?(?P<hometeam>[a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s+(?P<homescore>\d+)\s+\^?(?P<awayteam>[a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s+(?P<awayscore>\d+)\s+\((?P<time>\w+)\)/', $foo, $matches, PREG_SET_ORDER);
print_r($matches);
?>
output:
Array
(
[0] => Array
(
[0] => mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)
[hometeam] => Baltimore
[1] => Baltimore
[homescore] => 3
[2] => 3
[awayteam] => NY Yankees
[3] => NY Yankees
[awayscore] => 12
[4] => 12
[time] => FINAL
[5] => FINAL
)
[1] => Array
(
[0] => mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)
[hometeam] => Chicago Sox
[1] => Chicago Sox
[homescore] => 3
[2] => 3
[awayteam] => Minnesota
[3] => Minnesota
[awayscore] => 2
[4] => 2
[time] => FINAL
[5] => FINAL
)
)
Something like this should get you close.
preg_match_all('/mlb_s_left\d+=(?P<hometeam>\D+)\s+(?P<homescore>\d+)\s+(?P<awayteam>\D+)\s+(?P<awayscore>\d+)\s*\((?P<time>[^)]+)\)/',
$content, $matches);
Note that \d matches any digit, and \D matches anything that is not a digit.
[^)]+ matches one or more non-close parens characters; \s+ matches one or more whitespace chars, and \s* matches zero or more whitespace characters.
This wouldn't work very well if you have a city name with a number in it, and if you have a huge string, it's possible it could get hung up somewhere; you might consider splitting it up and matching a bit more piecemeal.
Generally speaking I would avoid .*? as a pattern match, as it basically matches almost anything. It's best for your regular expression to be as specific as possible, based on what you know about the data.