Regular expressions for Google operators

Regular expressions for Google operators - php

Using PHP, I'm trying to improve the search on my site by supporting Google like operators e.g.
keyword = natural/default
"keyword" or "search phrase" = exact match
keyword* = partial match
For this to work I need to to split the string into two arrays. One for the exact words (but without the double quotes) into $Array1() and put everything else (natural and partial keywords) into Array2().
What regular expressions would achieve this for the following string?
Example string ($string)
today i'm "trying" out a* "google search" "test"
Desired result
$Array1 = array(
[0]=>trying
[1]=>google search
[2]=>testing
);
$Array2 = array(
[0]=>today
[1]=>i'm
[2]=>out
[3]=>a*
);
1) Exact I've tried the following for the exact regexp but it returns two arrays, one with and one without the double quotes. I could just use $result[1] but there could be a trick that I'm missing here.
preg_match_all(
'/"([^"]+)"/iu',
'today i\'m "trying" \'out\' a* "google search" "test"',
$result
);
2) Natural/Partial The following rule returns the correct keywords, but along with several blank values. This regexp rule maybe sloppy or should I just run the array through array_filter()?
preg_split(
'/"([^"]+)"|(\s)/iu',
'today i\'m "trying" \'out\' a* "google search" "test"'
);

You can use strtok to tokenize the string.
See for example this tokenizeQuoted function derived from this tokenizedQuoted function in the comments on the strtok manual page:
// split a string into an array of space-delimited tokens, taking double-quoted and single-quoted strings into account
function tokenizeQuoted($string, $quotationMarks='"\'') {
$tokens = array(array(),array());
for ($nextToken=strtok($string, ' '); $nextToken!==false; $nextToken=strtok(' ')) {
if (strpos($quotationMarks, $nextToken[0]) !== false) {
if (strpos($quotationMarks, $nextToken[strlen($nextToken)-1]) !== false) {
$tokens[0][] = substr($nextToken, 1, -1);
} else {
$tokens[0][] = substr($nextToken, 1) . ' ' . strtok($nextToken[0]);
}
} else {
$tokens[1][] = $nextToken;
}
}
return $tokens;
}
Here’s an example of use:
$string = 'today i\'m "trying" out a* "google search" "test"';
var_dump(tokenizeQuoted($string));
The output:
array(2) {
[0]=>
array(3) {
[0]=>
string(6) "trying"
[1]=>
string(13) "google search"
[2]=>
string(4) "test"
}
[1]=>
array(4) {
[0]=>
string(5) "today"
[1]=>
string(3) "i'm"
[2]=>
string(3) "out"
[3]=>
string(2) "a*"
}
}

Related

numbers and next line only in the textarea field

I m trying to use following pattern
65465465465654
6546465465465465
5646545646464
6545646456
6454646456
in text area
please anyone help me
to check preg_match pattern
for the above input type
I want to take mobile numbers separated by the next line character.

Try this:
$numbers = "65465465465654
6546465465465465
5646545646464
6545646456
6454646456";
preg_match_all("/([0-9]*)\n/", $numbers, $resultArray);
foreach ($resultArray as $result) {
preg_replace("/\n/", "", $result);
}
Output:
array(4) {
[0]=>
string(14) "65465465465654"
[1]=>
string(16) "6546465465465465"
[2]=>
string(13) "5646545646464"
[3]=>
string(10) "6545646456"
}

Using PHP explode to create an array of words from $_GET

I'm having trouble using explode() in php.
I want to make an array of strings from the $_GET super global array.
The url will be like:
example/myproject.php?keywords=this+is+an+example
I want an array of the keywords so it should be like this:
myArray(6) = { [0]=> string(4) "this"
[1]=> string(2) "is"
[2]=> string(2) "an"
[3]=> string(7) "example" }
Here's my code:
$stringVals = explode("+",($_GET['keywords']));
var_dump($stringVals);
Here's the output:
array(1) { [0]=> string(30) "this is an example of a string" }
An example that works:
$pizza = "piece1 piece2 piece3 piece4 piece5 piece6";
$pieces = explode(" ", $pizza);
var_dump($pieces);
The output of this:
array(6) { [0]=> string(6) "piece1" [1]=> string(6) "piece2" [2]=>
string(6) "piece3" [3]=> string(6) "piece4" [4]=> string(6) "piece5"
[5]=> string(6) "piece6" }
I want the words from $_GET like that..

The "+" sign you see is actually just an encoded space. Therefore, you can split it normally using a space.
explode(' ', $_GET['keywords']);
Make sure you sanitize it if you're going to put it in a database.

Actually you can simply use:
explode(" ", $_GET['string'])
The + sign in the url actually means a space, not plus :- )
It's because spaces aren't allowed in the urls (url cannot have whitespaces), so it's actually converted to a plus sign.

In a normal GET request, the + in the URL will be converted back to spaces by the web server, so you should be exploding on ' '.
$stringVars = explode(' ', $_GET['keywords']);
See https://stackoverflow.com/a/2678602/1331451 for an explanation of why that is the case.

$myarray = explode(" ", $_GET['keywords']);
var_dump($myArray);
How's that?

Don't use plus symbol because The "+" sign you see is actually just an encoded space. use comma in URL while passing values from one page to another page here is solution after sending them in URL using comma separated form :-
$myArray = explode(',', $_REQUEST['keywords']);
after this you can get your data as following
$myArray[0]=this;
$myArray[1]=is;
$myArray[2]=an;
$myArray[3]=example;

$url = 'example/myproject.php?keywords=this+is+an+example';
$x = parse_url($url);
$y = str_replace("keywords=", "", $x["query"]);
var_dump(explode("+", $y));
First parse the url, second remove keywords=, next explode what's left by + sign.

Parse words in PHP

From the string of words, can I get only the words with a capitalized first letter? For example, I have this string:
Page and Brin originally nicknamed THEIR new search engine "BackRub",
because the system checked backlinks to estimate the importance of a
site.
I need to get: Page, Brin, THEIR, BackRub

A non-regex solution (based on Mark Baker's comment):
$result = array_filter(str_word_count($str, 1), function($item) {
return ctype_upper($item[0]);
});
print_r($result);
Output:
Array
(
[0] => Page
[2] => Brin
[5] => THEIR
[9] => BackRub
)

You can match that with
preg_match("/[A-Z]{1}[a-zA-z]*/um", $searchText)
You can see on php.net how preg_match can be applied.
http://ca1.php.net/preg_match
EDIT, TO ADD EXAMPLE
Here's an example of how to get the array with full matches
$searchText = 'Page and Brin originally nicknamed THEIR new search engine "BackRub", because the system checked backlinks to estimate the importance of a site.';
preg_match_all("/[A-Z]{1}[a-zA-z]*/um", $searchText, $matches );
var_dump( $matches );
The output is:
array(1) {
[0]=>
array(4) {
[0]=>
string(4) "Page"
[1]=>
string(4) "Brin"
[2]=>
string(5) "THEIR"
[3]=>
string(7) "BackRub"
}
}

The way I would do it is explode by space, ucfirst the exploded strings, and check them against the original.
here is what I mean:
$str = 'Page and Brin originally nicknamed THEIR new search engine "BackRub", because the system checked backlinks to estimate the importance of a site.';
$strings = explode(' ', $str);
$i = 0;
$out = array();
foreach($strings as $s)
{
if($strings[$i] == ucfirst($s))
{
$out[] = $s;
}
++$i;
}
var_dump($out);
http://codepad.org/QwrS4HpE

I would use strtok function (http://pl1.php.net/strtok), which returns the words in the string, one by one. You can specify the delimiter between words:
$string = 'Page and Brin originally nicknamed THEIR new search engine "BackRub", because the system checked backlinks to estimate the importance of a site.';
$delimiter = ' ,."'; // specify valid delimiters here (add others as needed)
$capitalized_words = array(); // array to hold the found words
$tok = strtok($string,$delimiter); // get first token
while ($tok !== false) {
$first_char = substr($tok,0,1);
if (strtoupper($first_char)===$first_char) {
// this word ($tok) is capitalized, store it
$capitalized_words[] = $tok;
}
$tok = strtok($delimiter); // get next token
}
var_dump($capitalized_words); // print the capitalized words found
This prints:
array(4) {
[0]=>
string(4) "Page"
[1]=>
string(4) "Brin"
[2]=>
string(5) "THEIR"
[3]=>
string(7) "BackRub"
}
Good luck!
Only drawback I can see is that it doesn't handle multibyte. If you have only English characters, then you're ok. If you have international characters, a modified/different solution may be needed.

You can do this using explode and loop through with regex:
$string = 'Page and Brin originally nicknamed THEIR new search engine "BackRub", because the system checked backlinks to estimate the importance of a site.';
$list = explode(' ',$string);
$matches = array();
foreach($list as $str) {
if(preg_match('/[A-Z]+[a-zA-Z]*/um',$str) $matches[] = $str;
}
print_r($matches);

Mysqli array failing?

I have a block of text and a preg_match_all sequence to create an array ($matches) from certain elements in the text.
I then look up a corresponding entry for each string in the first array using mysqli and receive a second array - ($replacement).
I want to replace the first array's position in the original text with the second array, re-finding the first array and naming it $arraytoreplace. This is the code I use:
$replacement = array();
$myq = "SELECT code,title FROM messages WHERE ID=?";
if ($stmt = $mysqli2->prepare($myq)) {
foreach($matches[1] as $value) {
$stmt->bind_param("s", $value);
$stmt->execute();
// bind result variables
$stmt->bind_result($d,$cc);
if($stmt->fetch()) {
$replacement[] = '' . $cc . '';
}
}
$stmt->close();
}
If I use var_dump on the arrays before the str_replace like so:
var_dump($arraytoreplace);
var_dump($replacement);
I get:
array(4) {
[0]=> string(3) "111"
[1]=> string(2) "12"
[2]=> string(4) "1234"
[3]=> string(1) "0"
}
array(4) {
[0]=> string(5) "hello"
[1]=> string(2) "hi"
[2]=> string(3) "foo"
[3]=> string(3) "bar"
}
I then use str_replace to drop the second array into the first array's place in the original text.
Usually this is fine, but everything breaks once it hits the 10 string in an array mark.
Instead of Text hello text hi I'll get Text 11foo text foo1 or something equally bizarre.
Any ideas?
Edit: The code used for replacing the arrays as follows:
$messageprep = str_replace($arraytoreplace, $replacement, $messagebody);
$messagepostprep = str_replace('#', '', $messageprep);
echo '<div class="messagebody">' . $messagepostprep . '</div>';

It looks like your getting partial replacements when a string of numbers is contained inside a longer string, i.e. 23 inside 1234.
You need to do your replacements with a regular expression on the boundary of the search string. Something like...
$text = preg_replace("/\b" . $replace . "\b/", $value, $text);
Another possible solution would be to consider changing the values to replace so that they are padded with zeros...
Array(
[0] => string(3) "0111"
[1] => string(2) "0012"
[2] => string(4) "1234"
[3] => string(1) "0000"
)
...and make sure that your search strings are also padded with zeros, because 0012 will never be confused with 12 and accidentally found in 0123.

PHP json-like-string split

I have this $str value :
[{\"firstname\":\"guest1\",\"lastname\":\"one\",\"age\":\"22\",\"gender\":\"Male\"},{\"firstname\":\"guest2\",\"lastname\":\"two\",\"age\":\"22\",\"gender\":\"Female\"}]
I want to split it into the following:
firstname:guest1,lastname:one,age:22
firstname:guest2,lastname:two,age:22
I tried explode (",",$str) , but it explode all using , as delimiter and I don't get what I want
anyone can help me ?

As Josh K points out, that looks suspiciously like a JSON string. Maybe you should do a json_decode() on it to get the actual data you're looking for, all organized nicely into an array of objects.
EDIT: it seems your string is itself wrapped in double quotes ", so you'll have to trim those away before you'll be able to decode it as valid JSON:
$str_json = trim($str, '"');
$guests = json_decode($str_json);
var_dump($guests);
I get this output with the var_dump(), so it's definitely valid JSON here:
array(2) {
[0]=>
object(stdClass)#1 (4) {
["firstname"]=>
string(6) "guest1"
["lastname"]=>
string(3) "one"
["age"]=>
string(2) "22"
["gender"]=>
string(4) "Male"
}
[1]=>
object(stdClass)#2 (4) {
["firstname"]=>
string(6) "guest2"
["lastname"]=>
string(3) "two"
["age"]=>
string(2) "22"
["gender"]=>
string(6) "Female"
}
}
JSON (JavaScript Object Notation) is not CSV (comma-separated values). They're two vastly different data formats, so you can't parse one like the other.
To get your two strings, use a loop to get the keys and values of each object, and then build the strings with those values:
foreach ($guests as $guest) {
$s = array();
foreach ($guest as $k => $v) {
if ($k == 'gender') break;
$s[] = "$k:$v";
}
echo implode(',', $s) . "\n";
}
Output:
firstname:guest1,lastname:one,age:22
firstname:guest2,lastname:two,age:22
(Assuming you do want to exclude the genders for whatever reason; if not, delete the if ($k == 'gender') break; line.)

If you split on ,'s then you will get all the other crap that surrounds it. You would then have to strip that off.
Looks a lot like JSON data to me, where is this string coming from?

If that is valid json, just run it through json_decode() to get a native php array...
Note that you may need to run it through stripslashes() first, as it appears you may have magic_quotes_gpc set... You can conditionally call it by checking with the function get_magic_quotes_gpc:
if (get_magic_quotes_gpc()) {
$_POST['foo'] = stripslashes($_POST['foo']);
}
$array = json_decode($_POST['foo']);

You need to use preg_replace function.
$ptn = "/,\\"gender\\":\\"\w+\\"\}\]?|\\"|\[?\{/";
$str = "[{\"firstname\":\"guest1\",\"lastname\":\"one\",\"age\":\"22\",\"gender\":\"Male\"},{\"firstname\":\"guest2\",\"lastname\":\"two\",\"age\":\"22\",\"gender\":\"Female\"}]";
$rpltxt = "";
echo preg_replace($ptn, $rpltxt, $str);
You can the php regular expression tester to test the result.
or use preg_match_all
$ptn = "/(firstname)\\":\\"(\w+)\\",\\"(lastname)\\":\\"(\w+)\\",\\"(age)\\":\\"(\d+)/";
$str = "[{\"firstname\":\"guest1\",\"lastname\":\"one\",\"age\":\"22\",\"gender\":\"Male\"},{\"firstname\":\"guest2\",\"lastname\":\"two\",\"age\":\"22\",\"gender\":\"Female\"}]";
preg_match_all($ptn, $str, $matches);
print_r($matches);

i still haven't get a chance to retrieve the JSON :
I var_dump the trimmed value as :
$str_json = trim($userdetails->other_guests, '"');
$guests = json_decode($str_json);
var_dump($str_json,$guests);
WHERE $userdetails->other_guests is the $str value I had before...
I get the following output :
string(169) "[{\"firstname\":\"guest1\",\"lastname\":\"one\",\"age\":\"22\",\"gender\":\"Male\"},{\"firstname\":\"guest2\",\"lastname\":\"two\",\"age\":\"23\",\"gender\":\"Female\"}]"
NULL
This mean the decoded json are NULL... strange

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regular expressions for Google operators - php

Related

numbers and next line only in the textarea field

Using PHP explode to create an array of words from $_GET

Parse words in PHP

Mysqli array failing?

PHP json-like-string split

Categories

Resources