Matching words from an array causes some vulnerabilities - php

Suppose there is a word in an array as "pan" and user typed a sentence which
contain a word as "pants" the program matches the word "pants" as "pan" please help me out, and for further information there are also some words in the array like "my valet","the nest" etc... need to be match.Thanks in advance.:)
if($val['items']!=null){
$items = explode(',',$val['items']);
foreach($items as $k=>$item){
if($item!='' && preg_match("/".preg_quote($item,"/")."/", $opText)){
if(!in_array($item,$parameters[$val['name']],true)){
$parameters[$val['name']][]=$item;
}
}
}
}

Use a delimiter such as a space. Instead of searching for the word "pan" search for " pan ", when you search for words like "my valet" search for " my valet " instead. You also need to handle if it is the first word of the sentence in which case you ignore the first space. Does this resolve your issue?

Related

Compare two sentences sequence wise

The scenario is there is a given sentence. The user have to type in the sentence in some input field and that will be saved to database.
Now, I want to compare both the sentences and find out, how many words the user has typed in correctly and incorrectly (missed any word, or added some new words by typing mistake).
Now I have done something like this-
$Str1 = "i **am** suraj roy i **am** having my lunch";
$Str2 = "i suraj roy i **am** having my dinner";
$st1 = (explode(" ", $Str1));
$st2 = (explode(" ", $Str2));
$result1 = array_diff($st1, $st2);
$result2 = array_diff($st2, $st1);
$word_count1 = count($result1); // words not present in typed sentence from original sentence - lunch
$word_count2 = count($result2); // newly added words in typed sentence - dinner
$total_error_count = $word_count1 + $word_count2;
echo 'Total words not matching between two sentences - '.$total_error_count;
Now this is working as i expected. Good.
But the problem is, have a look at the word am in both sentences, as the comparison is being done between array its taking single occurrence of am in typed sentence for all the occurrence of am in original sentence. So, there is no missing word count for am in typed sentence.
So, any help would be appreciated. Thanks.

How to properly replace strings when you have repeated substrings?

I want to add hyperlinks to urls in a text, but the problem is that I can have different formats and the urls could have some substrings repeated in other strings. Let me explain it better with an example:
Here I have one insidelinkhttp://google.com But I can have more formats like the followings: https://google.com google.com
And right now I have the following links extracted from the above example: ["http://google.com", "https://google.com", "google.com"] and I want to replace those matches with the following array: ['http://google.com', 'https://google.com', 'google.com']
If I iterate over the array replacing each element there will be an error as in the above example once that I have properly added the hyperlink for "http://google.com" each substring will be replaced with another hyperlink from "google.com"
Anyone has any idea about how solve that problem?
Thanks
On the basis of your sample string, I have defined 3 different patterns for URL matching and replace it as per your requirement, you can define more patterns in the "$regEX" variable.
// string
$str = "Here I have one insidelinkhttp://google.com But I can have more formats like the followings: https://google.com google.com";
/**
* Replace with the match pattern
*/
function urls_matches($url1)
{
if (isset($url1[0])) {
return '' . $url1[0] . '';
}
}
// regular expression for multiple patterns
$regEX = "/(http:\/\/[a-zA-Z0-9]+\.+[A-Za-z]{2,6}+)|(https:\/\/[a-zA-Z0-9]+\.+[A-Za-z]{2,6}+)|([a-zA-Z0-9]+\.+[A-Za-z]{2,6}+)/";
// replacing string based on defined patterns
$replacedString = preg_replace_callback(
$regEX,
"urls_matches",
$str
);
// print the replaced string
echo $replacedString;
You could do a search and replace them with templatestrings.
e.g.: STRINGA, STRINGB, STRINGC
Then loop over the array where item 0 replaces STRINGA.
Just make sure the template names don't have overlapping names, like STRING1 and STRING10

Regex, PHP - finding words that need correction

I have a long string with words. Some of the words have special letters.
For example a string "have now a rea$l problem with$ dolar inp$t"
and i have a special letter "$".
I need to find and return all the words with special letters in a quickest way possible.
What I did is a function that parse this string by space and then using “for” going over all the words and searching for special character in each word. When it finds it—it saves it in an array. But I have been told that using regexes I can have it with much better performance and I don’t know how to implement it using them.
What is the best approach for it?
I am a new to regex but I understand it can help me with this task?
My code: (forbiden is a const)
The code works for now, only for one forbidden char.
function findSpecialChar($x){
$special = "";
$exploded = explode(" ", $x);
foreach ($exploded as $word){
if (strpos($word,$forbidden) !== false)
$special .= $word;
}
return $special;
}
You could use preg_match like this:
// Set your special word here.
$special_word = "café";
// Set your sentence here.
$string = "I like to eat food at a café and then read a magazine.";
// Run it through 'preg_match''.
preg_match("/(?:\W|^)(\Q$special_word\E)(?:\W|$)/i", $string, $matches);
// Dump the results to see it working.
echo '<pre>';
print_r($matches);
echo '</pre>';
The output would be:
Array
(
[0] => café
[1] => café
)
Then if you wanted to replace that, you could do this using preg_replace:
// Set your special word here.
$special_word = "café";
// Set your special word here.
$special_word_replacement = " restaurant ";
// Set your sentence here.
$string = "I like to eat food at a café and then read a magazine.";
// Run it through 'preg_replace''.
$new_string = preg_replace("/(?:\W|^)(\Q$special_word\E)(?:\W|$)/i", $special_word_replacement, $string);
// Echo the results.
echo $new_string;
And the output for that would be:
I like to eat food at a restaurant and then read a magazine.
I am sure the regex could be refined to avoid having to add spaces before and after " restaurant " like I do in this example, but this is the basic concept I believe you are looking for.

RegEx or Similar - Grab string preceding matched value

Here's the deal, I am handling a OCR text document and grabbing UPC information from it with RegEx. That part I've figured out. Then I query a database and if I don't have record of that UPC I need to go back to the text document and get the description of the product.
The format on the receipt is:
NAME OF ITEM 123456789012
OTHER NAME 987654321098
NAME 567890123456
So, when I go back the second time to find the name of the item I am at a complete loss. I know how to get to the line where the UPC is, but how can I use something like regex to get the name that precedes the UPC? Or some other method. I was thinking of somehow storing the entire line and then parsing it with PHP, but not sure how to get the line either.
Using PHP.
Get all of the names of the items indexed by their UPCs with a regex and preg_match_all():
$str = 'NAME OF ITEM 123456789012
OTHER NAME 987654321098
NAME 567890123456';
preg_match_all( '/^(.*?)\s+(\d+)/m', $str, $matches);
$items = array();
foreach( $matches[2] as $k => $upc) {
if( !isset( $items[$upc])) {
$items[$upc] = array( 'name' => $matches[1][$k], 'count' => 0);
}
$items[$upc]['count']++;
}
This forms $items so it looks like:
Array (
[123456789012] => NAME OF ITEM
[987654321098] => OTHER NAME
[567890123456] => NAME
)
Now, you can lookup any item name you want in O(1) time, as seen in this demo:
echo $items['987654321098']; // OTHER NAME
You can find the string preceding a value you know with the following regex:
$receipt = "NAME OF ITEM 123456789012\n" .
"OTHER NAME 987654321098\n" .
"NAME 567890123456";
$upc = '987654321098';
if (preg_match("/^(.*?) *{$upc}/m", $receipt, $matches)) {
$name = $matches[1];
var_dump($name);
}
The /m flag on the regex makes the ^ work properly with multi-line input.
The ? in (.*?) makes that part non-greedy, so it doesn't grab all the spaces
It would be simpler if you grabbed both the name and the number at the same time during the initial pass. Then, when you check the database to see if the number is present, you already have the name if you need to use it. Consider:
preg_match_all('^([A-Za-z ]+) (\d+)$', $document, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$name = $match[1];
$number = $match[2];
if (!order_number_in_database($number)) {
save_new_order($number, $name);
}
}
You can use lookahead assertions to match string preceding the UPC.
http://php.net/manual/en/regexp.reference.assertions.php
By something like this: ^\S*(?=\s*123456789012) substituting the UPC with the UPC of the item you want to find.
I'm lazy, so I would just use one regex that gets both parts in one shot using matching groups. Then, I would call it every time and put each capture group into name and upc variables. For cases in which you need the name, just reference it.
Use this type of regex:
/([a-zA-Z ]+)\s*(\d*)/
Then you will have the name in the $1 matching group and the UPC the $2 matching group. Sorry, it's been a while since I've used php, so I can't give you an exact code snippet.
Note: the suggested regex assumes you'll only have letters or spaces in your "names" if that's not the case, you'll have to expand the character class.

How to write regex to return only certain parts of this string?

So I'm working on a project that will allow users to enter poker hand histories from sites like PokerStars and then display the hand to them.
It seems that regex would be a great tool for this, however I rank my regex knowledge at "slim to none".
So I'm using PHP and looping through this block of text line by line and on lines like this:
Seat 1: fabulous29 (835 in chips)
Seat 2: Nioreh_21 (6465 in chips)
Seat 3: Big Loads (3465 in chips)
Seat 4: Sauchie (2060 in chips)
I want to extract seat number, name, & chip count so the format is
Seat [number]: [letters&numbers&characters] ([number] in chips)
I have NO IDEA where to start or what commands I should even be using to optimize this.
Any advice is greatly appreciated - even if it is just a link to a tutorial on PHP regex or the name of the command(s) I should be using.
I'm not entirely sure what exactly to use for that without trying it, but a great tool I use all the time to validate my RegEx is RegExr which gives a great flash interface for trying out your regex, including real time matching and a library of predefined snippets to use. Definitely a great time saver :)
Something like this might do the trick:
/Seat (\d+): ([^\(]+) \((\d+)in chips\)/
And some basic explanation on how Regex works:
\d = digit.
\<character> = escapes character, if not part of any character class or subexpression. for example:
\t
would render a tab, while \\t would render "\t" (since the backslash is escaped).
+ = one or more of the preceding element.
* = zero or more of the preceding element.
[ ] = bracket expression. Matches any of the characters within the bracket. Also works with ranges (ex. A-Z).
[^ ] = Matches any character that is NOT within the bracket.
( ) = Marked subexpression. The data matched within this can be recalled later.
Anyway, I chose to use
([^\(]+)
since the example provides a name containing spaces (Seat 3 in the example). what this does is that it matches any character up to the point that it encounters an opening paranthesis.
This will leave you with a blank space at the end of the subexpression (using the data provided in the example). However, his can easily be stripped away using the trim() command in PHP.
If you do not want to match spaces, only alphanumerical characters, you could so something like this:
([A-Za-z0-9-_]+)
Which would match any letter (within A-Z, both upper- & lower-case), number as well as hyphens and underscores.
Or the same variant, with spaces:
([A-Za-z0-9-_\s]+)
Where "\s" is evaluated into a space.
Hope this helps :)
Look at the PCRE section in the PHP Manual. Also, http://www.regular-expressions.info/ is a great site for learning regex. Disclaimer: Regex is very addictive once you learn it.
I always use the preg_ set of function for REGEX in PHP because the PERL-compatible expressions have much more capability. That extra capability doesn't necessarily come into play here, but they are also supposed to be faster, so why not use them anyway, right?
For an expression, try this:
/Seat (\d+): ([^ ]+) \((\d+)/
You can use preg_match() on each line, storing the results in an array. You can then get at those results and manipulate them as you like.
EDIT:
Btw, you could also run preg_match_all on the entire block of text (instead of looping through line-by-line) and get the results that way, too.
Check out preg_match.
Probably looking for something like...
<?php
$str = 'Seat 1: fabulous29 (835 in chips)';
preg_match('/Seat (?<seatNo>\d+): (?<name>\w+) \((?<chipCnt>\d+) in chips\)/', $str, $matches);
print_r($matches);
?>
*It's been a while since I did php, so this could be a little or a lot off.*
May be it is very late answer, But I am interested in answering
Seat\s(\d):\s([\w\s]+)\s\((\d+).*\)
http://regex101.com/r/cU7yD7/1
Here's what I'm currently using:
preg_match("/(Seat \d+: [A-Za-z0-9 _-]+) \((\d+) in chips\)/",$line)
To process the whole input string at once, use preg_match_all()
preg_match_all('/Seat (\d+): \w+ \((\d+) in chips\)/', $preg_match_all, $matches);
For your input string, var_dump of $matches will look like this:
array
0 =>
array
0 => string 'Seat 1: fabulous29 (835 in chips)' (length=33)
1 => string 'Seat 2: Nioreh_21 (6465 in chips)' (length=33)
2 => string 'Seat 4: Sauchie (2060 in chips)' (length=31)
1 =>
array
0 => string '1' (length=1)
1 => string '2' (length=1)
2 => string '4' (length=1)
2 =>
array
0 => string '835' (length=3)
1 => string '6465' (length=4)
2 => string '2060' (length=4)
On learning regex: Get Mastering Regular Expressions, 3rd Edition. Nothing else comes close to the this book if you really want to learn regex. Despite being the definitive guide to regex, the book is very beginner friendly.
Try this code. It works for me
Let say that you have below lines of strings
$string1 = "Seat 1: fabulous29 (835 in chips)";
$string2 = "Seat 2: Nioreh_21 (6465 in chips)";
$string3 = "Seat 3: Big Loads (3465 in chips)";
$string4 = "Seat 4: Sauchie (2060 in chips)";
Add to array
$lines = array($string1,$string2,$string3,$string4);
foreach($lines as $line )
{
$seatArray = explode(":", $line);
$seat = explode(" ",$seatArray[0]);
$seatNumber = $seat[1];
$usernameArray = explode("(",$seatArray[1]);
$username = trim($usernameArray[0]);
$chipArray = explode(" ",$usernameArray[1]);
$chipNumber = $chipArray[0];
echo "<br>"."Seat [".$seatNumber."]: [". $username."] ([".$chipNumber."] in chips)";
}
you'll have to split the file by linebreaks,
then loop thru each line and apply the following logic
$seat = 0;
$name = 1;
$chips = 2;
foreach( $string in $file ) {
if (preg_match("Seat ([1-0]): ([A-Za-z_0-9]*) \(([1-0]*) in chips\)", $string, $matches)) {
echo "Seat: " . $matches[$seat] . "<br>";
echo "Name: " . $matches[$name] . "<br>";
echo "Chips: " . $matches[$chips] . "<br>";
}
}
I haven't ran this code, so you may have to fix some errors...
Seat [number]: [letters&numbers&characters] ([number] in chips)
Your Regex should look something like this
Seat (\d+): ([a-zA-Z0-9]+) \((\d+) in chips\)
The brackets will let you capture the seat number, name and number of chips in groups.

Categories