RegEx or Similar - Grab string preceding matched value - php

Here's the deal, I am handling a OCR text document and grabbing UPC information from it with RegEx. That part I've figured out. Then I query a database and if I don't have record of that UPC I need to go back to the text document and get the description of the product.
The format on the receipt is:
NAME OF ITEM 123456789012
OTHER NAME 987654321098
NAME 567890123456
So, when I go back the second time to find the name of the item I am at a complete loss. I know how to get to the line where the UPC is, but how can I use something like regex to get the name that precedes the UPC? Or some other method. I was thinking of somehow storing the entire line and then parsing it with PHP, but not sure how to get the line either.
Using PHP.

Get all of the names of the items indexed by their UPCs with a regex and preg_match_all():
$str = 'NAME OF ITEM 123456789012
OTHER NAME 987654321098
NAME 567890123456';
preg_match_all( '/^(.*?)\s+(\d+)/m', $str, $matches);
$items = array();
foreach( $matches[2] as $k => $upc) {
if( !isset( $items[$upc])) {
$items[$upc] = array( 'name' => $matches[1][$k], 'count' => 0);
}
$items[$upc]['count']++;
}
This forms $items so it looks like:
Array (
[123456789012] => NAME OF ITEM
[987654321098] => OTHER NAME
[567890123456] => NAME
)
Now, you can lookup any item name you want in O(1) time, as seen in this demo:
echo $items['987654321098']; // OTHER NAME

You can find the string preceding a value you know with the following regex:
$receipt = "NAME OF ITEM 123456789012\n" .
"OTHER NAME 987654321098\n" .
"NAME 567890123456";
$upc = '987654321098';
if (preg_match("/^(.*?) *{$upc}/m", $receipt, $matches)) {
$name = $matches[1];
var_dump($name);
}
The /m flag on the regex makes the ^ work properly with multi-line input.
The ? in (.*?) makes that part non-greedy, so it doesn't grab all the spaces

It would be simpler if you grabbed both the name and the number at the same time during the initial pass. Then, when you check the database to see if the number is present, you already have the name if you need to use it. Consider:
preg_match_all('^([A-Za-z ]+) (\d+)$', $document, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$name = $match[1];
$number = $match[2];
if (!order_number_in_database($number)) {
save_new_order($number, $name);
}
}

You can use lookahead assertions to match string preceding the UPC.
http://php.net/manual/en/regexp.reference.assertions.php
By something like this: ^\S*(?=\s*123456789012) substituting the UPC with the UPC of the item you want to find.

I'm lazy, so I would just use one regex that gets both parts in one shot using matching groups. Then, I would call it every time and put each capture group into name and upc variables. For cases in which you need the name, just reference it.
Use this type of regex:
/([a-zA-Z ]+)\s*(\d*)/
Then you will have the name in the $1 matching group and the UPC the $2 matching group. Sorry, it's been a while since I've used php, so I can't give you an exact code snippet.
Note: the suggested regex assumes you'll only have letters or spaces in your "names" if that's not the case, you'll have to expand the character class.

Related

How to properly replace strings when you have repeated substrings?

I want to add hyperlinks to urls in a text, but the problem is that I can have different formats and the urls could have some substrings repeated in other strings. Let me explain it better with an example:
Here I have one insidelinkhttp://google.com But I can have more formats like the followings: https://google.com google.com
And right now I have the following links extracted from the above example: ["http://google.com", "https://google.com", "google.com"] and I want to replace those matches with the following array: ['http://google.com', 'https://google.com', 'google.com']
If I iterate over the array replacing each element there will be an error as in the above example once that I have properly added the hyperlink for "http://google.com" each substring will be replaced with another hyperlink from "google.com"
Anyone has any idea about how solve that problem?
Thanks
On the basis of your sample string, I have defined 3 different patterns for URL matching and replace it as per your requirement, you can define more patterns in the "$regEX" variable.
// string
$str = "Here I have one insidelinkhttp://google.com But I can have more formats like the followings: https://google.com google.com";
/**
* Replace with the match pattern
*/
function urls_matches($url1)
{
if (isset($url1[0])) {
return '' . $url1[0] . '';
}
}
// regular expression for multiple patterns
$regEX = "/(http:\/\/[a-zA-Z0-9]+\.+[A-Za-z]{2,6}+)|(https:\/\/[a-zA-Z0-9]+\.+[A-Za-z]{2,6}+)|([a-zA-Z0-9]+\.+[A-Za-z]{2,6}+)/";
// replacing string based on defined patterns
$replacedString = preg_replace_callback(
$regEX,
"urls_matches",
$str
);
// print the replaced string
echo $replacedString;
You could do a search and replace them with templatestrings.
e.g.: STRINGA, STRINGB, STRINGC
Then loop over the array where item 0 replaces STRINGA.
Just make sure the template names don't have overlapping names, like STRING1 and STRING10

Split a variable into other variables

Hello awesome people on the internet! I need some help :)
I have a php rcon script, this script saves the result of the rcon to a variable named results, this is an example.
results = Showing 2 tracked objective(s) for lluiscab:- rcon: 4 (rcon)- test: 5555 (test)
I want to set a variable like rcon to 4 and test to 5555.
I used explode and other thinks that I found on the web, but I can't make it work. Does someone know how to do it?
Edit: This variable changes, so, sometimes I can have rcon, test and coins and sometime only rcon
You can use a regular expression for this.
preg_match('/rcon:\s*(\d+).*test:\s*(\d+)/', $line, $match);
$rcon = $match[1];
$test = $match[2];
\d+ matches a sequence of numbers, and putting () around it makes it a capture group. $match contains the parts of the input line that were matched by the regular expression, and $match[N] contains the Nth capture group.
If you need to capture anything that looks like word: number, you can use preg_match_all and an associative array.
preg_match_all('/(\w+):\s*(\d+)/', $line, $matches, PREG_SET_ORDER);
$results = array();
foreach ($matches as $match) {
$results[$match[1]] = intval($match[2]);
}
For the example input, this will create
$results = array(
'rcon' => 4,
'test' => 5555
);
DEMO

str_replace for multiple items then store results in new variable

I have a string like this :
$content = 'Hi #steve, have you seen what #dave wrote?';
I essentially want to locate only the words directly following the # symbol (like twitter mentions), and then loop through each result performing a task with them.
I know how to strip the # char from the entire string :
$strip_at = str_replace( '#', '', $content );
Which would result in $strip_at being :
Hi steve, have you seen what dave wrote?
But how would I use str_replace to locate each "mention", remove the # symbol just leaving the word (a name in this case), and then store the results of each "mention" in an array to a new variable?
Desired result :
$mentions = array('steve','dave');
So I could then loop through $mentions and do stuff with the results, eg :
foreach ($mentions as $mention) {
echo 'This persons name is '.$mention.'<br />';
}
You can accomplish this by means of a regular expression:
preg_match_all('/(?<=#)(\w){1,15}/', $content, $results);
which will store this array in the variable $results:
[
[
"steve",
"dave",
],
]
and you could enumerate the matches by looping over $results[0]:
foreach($results[0] as $name) {
echo $name . '<br>';
}
prints:
steve
dave
If you're curious about what /(?<=#)(\w){1,15}/ means:
(?<=#) - "lookbehind" - this means we need a # to precede what we're actually interested in matching
(\w){1,15} means match a word with a maximum length of 15 (the max size of a twitter name)
so together we're matching the twitter username that follows an # sign.

PHP trouble with preg_match

I thought I had this working; however after further evaluation it seems it's not working as I would have hoped it was.
I have a query pulling back a string. The string is a comma separated list just as you see here:
(1,145,154,155,158,304)
Nothing has been added or removed.
I have a function that I thought I could use preg_match to determine if the user's id was contained within the string. However, it appears that my code is looking for any part.
preg_match('/'.$_SESSION['MyUserID'].'/',$datafs['OptFilter_1']))
using the same it would look like such
preg_match('/1/',(1,145,154,155,158,304)) I would think. After testing if my user id is 4 the current code returns true and it shouldn't. What am I doing wrong? As you can see the id length can change.
It's better to have all your IDs in an array then checking if a desired ID is existed:
<?php
$str = "(1,145,154,155,158,304)";
$str = str_replace(array("(", ")"), "", $str);
$arr = explode(',', $str);
if(in_array($_SESSION['MyUserID'], $arr))
{
// ID existed
}
As your string - In dealing with Regular Expressions, however it's not recommended here, below regex will match your ID if it's there:
preg_match("#[,(]$ID[,)]#", $str)
Explanations:
[,(] # a comma , or opening-parenthesis ( character
$ID # your ID
[,)] # a comma , or closing-parenthesis ) character

PHP Regular Expression to Match Function Name and Parameters with string like Needle(needle|needle)

I am filtering database results with a query string that looks like this:
attribute=operator(value|optional value)
I'll use
$_GET['attribute'];
to get the value.
I believe the right approach is using regex to get matches on the rest.
The preferred output would be
print_r($matches);
array(
1 => operator
2 => value
3 => optional value
)
The operator will always be one word and consist of letters: like(), between(), in().
The values can be many different things including letters, numbers, spaces commas, quotation marks, etc...
I was asked where my code was failing and didn't include much code because of how poorly it worked. Based on the accepted answer, I was able to whip up a regex that almost works.
EDIT 1
$pattern = "^([^\|(]+)\(([^\|()]+)(\|*)([^\|()]*)";
Edit 2
$pattern = "^([^\|(]+)\(([^\|()]+)(\|*)([^\|()]*)"; // I thought this would work.
Edit 3
$pattern = "^([^\|(]+)\(([^\|()]+)(\|+)?([^\|()]+)?"; // this does work!
Edit 4
$pattern = "^([^\|(]+)\(([^\|()]+)(?:\|)?([^\|()]+)?"; // this gets rid of the middle matching group.
The only remaining problem is when the 2nd optional parameter does not exist, there is still an empty $matches array.
This script, with the input "operator(value|optional value)", returns the array you expect:
<?php
$attribute = $_GET['attribute'];
$result = preg_match("/^([\w ]+)\(([\w ]+)\|([\w ]*)\)$/", $attribute, $matches);
print($matches[1] . "\n");
print($matches[2] . "\n");
print($matches[3] . "\n");
?>
This assumes your "values" match [\w ] regexp (all word characters plus space), and that the | you specify is a literal |...

Categories