Neo4j Comparing string elements in array to php variable - php

I have nodes in my database that are under the label Keywords with words as an attribute.I would like to compare a string ($mostRecentPost) with the words in the array, words.
$queryString ="WITH["Batman","Jaws","Fun","Baseball","Halo","PS4","Nike","Jeep","Mustang"] AS words MATCH (n.Keywords) WHERE ".$mostRecentPost." =~'(?i).*n.kw.*' IN words RETURN n";
$query = new Everyman\Neo4j\Cypher\Query($client, $queryString);
$relativePosts = $query->getResultSet();
Basically we have an example $mostRecentPost = a node, with content = "the new Halo looks awesome". I am trying to compare the contents of that node with the contents of the words array, when it matches one of the array words with some word in the post, it returns that word.

Your query seems to be totally off:
you don't use your "words" anywhere
not sure what .$mostRecentPost stands for
your regexp is not related to the words at all
WITH["Batman","Jaws","Fun","Baseball","Halo","PS4","Nike","Jeep","Mustang"] AS words
MATCH (n.Keywords)
WHERE ".$mostRecentPost." =~'(?i).n.kw.' IN words
RETURN n
You could do (which will be not fast):
MATCH (n:Keywords)
WHERE n.text =~ '(?i).*(Batman|Jaws|Fun|...).*'
RETURN n
and use a parameter for the regexp-string
You should use fulltext-search with a list of words, see this blog post for some info on how to set it up with Neo4j 2.0 http://jexp.de/blog/2014/03/full-text-indexing-fts-in-neo4j-2-0/

Related

How to properly replace strings when you have repeated substrings?

I want to add hyperlinks to urls in a text, but the problem is that I can have different formats and the urls could have some substrings repeated in other strings. Let me explain it better with an example:
Here I have one insidelinkhttp://google.com But I can have more formats like the followings: https://google.com google.com
And right now I have the following links extracted from the above example: ["http://google.com", "https://google.com", "google.com"] and I want to replace those matches with the following array: ['http://google.com', 'https://google.com', 'google.com']
If I iterate over the array replacing each element there will be an error as in the above example once that I have properly added the hyperlink for "http://google.com" each substring will be replaced with another hyperlink from "google.com"
Anyone has any idea about how solve that problem?
Thanks
On the basis of your sample string, I have defined 3 different patterns for URL matching and replace it as per your requirement, you can define more patterns in the "$regEX" variable.
// string
$str = "Here I have one insidelinkhttp://google.com But I can have more formats like the followings: https://google.com google.com";
/**
* Replace with the match pattern
*/
function urls_matches($url1)
{
if (isset($url1[0])) {
return '' . $url1[0] . '';
}
}
// regular expression for multiple patterns
$regEX = "/(http:\/\/[a-zA-Z0-9]+\.+[A-Za-z]{2,6}+)|(https:\/\/[a-zA-Z0-9]+\.+[A-Za-z]{2,6}+)|([a-zA-Z0-9]+\.+[A-Za-z]{2,6}+)/";
// replacing string based on defined patterns
$replacedString = preg_replace_callback(
$regEX,
"urls_matches",
$str
);
// print the replaced string
echo $replacedString;
You could do a search and replace them with templatestrings.
e.g.: STRINGA, STRINGB, STRINGC
Then loop over the array where item 0 replaces STRINGA.
Just make sure the template names don't have overlapping names, like STRING1 and STRING10

number should be separated from specific string format

I'm new in PHP. I've this format of strings ####CONTENT-21#### including different ids from content table in my page description section of page table. Just I need to separate the id numbers only & call contents from content table. how can I separate the numbers from the specific format ####CONTENT-21#### of string?
$string = '####CONTENT-21####';
$newnumber = filter_var($string, FILTER_SANITIZE_NUMBER_INT);
//if the number is preceeded by a (-) it will also get the (-) so trim the first char
ltrim($newnumber, '-');
First Case
If you are sure that all strings start with ####CONTENT- and end with #### you can get the numbers between with the following code, assuming $str is your string and $id is where you will store your ID.
$id = substr($str,12,strlen($str)-16);
Second Case
If you are not sure about the position of the id, but you are sure that ID is the only number in the string, you can match it with regular expression, assuming $str is your string and $id is where you will store your ID.
preg_match_all('!\d+!', $con_string, $matches);
$id = implode(' ', $matches[0]);

Find most popular strings in a table

I am using MyISAM engine with fulltext indexing for storing a list of strings.
These strings can be a single word, or a sentence.
If I want to know how many times string hello appears in my table, I do
SELECT COUNT(*) Total
FROM String s
WHERE
MATCH (s.name) AGAINST ('hello')
I would like to create a similar report, but for all strings. Result should be a list of TOP-N strings that are the most common in this table (top ones most probably are "the", "a", "to" etc.).
Exact match case is pretty obvious:
SELECT name as String, count(*) as Total
FROM String
GROUP
BY name
ORDER
BY total desc
LIMIT *some number*
But it counts only whole strings.
Is there any way to achieve my desired result?
Thanks.
I guess there is no easy way for this. I would create a "statistic table" for this purpose only. One column for words themselves, one column for the number of occurrences. (Primary key on the first column of course.)
For this with a PL/SQL block scanning all strings, and split them for words.
If the string is not found in the statistic table, you insert a new row.
If the string is found in the statistic table, you increase the value in the second column.
This can run for a pretty long time, but after the first run is ready, you only have to check the new strings on insert, perhaps with a trigger. (Assuming you want to use it not once but regularly.)
Hope this helps, I have no simpler answer.
i think if you use the LIKE command will works
select name, count(*) as total from String where name like '%hello%' group by name order by total
let me know
I didn't find any solution with SQL and my Full text index, but I managed to get my desired result by getting all of my strings from DB and processing them on the backend with php:
//get all strings from DB
$queryResult = $db->query("SELECT name as String FROM String");
//Combine all of them into array
while($row = $queryResult->fetch_array(MYSQLI_ASSOC)) {
$stringArray[] = $row['String'];
}
//"Glue" all these strings into one huge string
$text = implode(" ", $stringArray);
//Make everything lowercase
$textLowercase = strtolower($text);
//Find all words
$result = preg_split('/[^a-z]/', $textLowercase, -1, PREG_SPLIT_NO_EMPTY);
//Filter some unwanted words
$result = array_filter($result, function($x){
return !preg_match("/^(.|the|and|of|to|it|in|or|is|a|an|not|are)$/",$x);
});
//Count a number of occurrence of each word
$result = array_count_values($result);
//Sort
arsort($result);
//Select TOP-N strings, where N is $amount
$result = array_slice($result, 0, $amount);

preg_replace with a word in an array

I am trying to use certain words in a array called keywords, which will be used to be replaced in a string by "as".
for($i = 0; $i<sizeof($this->keywords[$this->lang]); $i++)
{
$word = $this->keywords[$this->lang][$i];
$a = preg_replace("/\b$word\b/i", "as",$this->code);
}
It works with if I replace the variable $word with something like /\bhello\b/i, which then would replace all hello words with "as".
Is the approach am using even possible?
Before to be a pattern, it's a double quoted string, so variables will be replaced, it's not the problem.
The problem is that you use a loop to change several words and you store the result in $a:
the first iteration, all the occurences of the first word in $this->code are replaced and the new string is stored in $a.
but the next iteration doesn't reuse $a as third parameter to replace the next word, but always the original string $this->code
Result: after the for loop $a contains the original string but with only the occurences of the last word replaced with as.
When you want to replace several words with the same string, a way consists to build an alternation: word1|word2|word3.... It can easily be done with implode:
$alternation = implode('|', $this->keywords[$this->lang]);
$pattern = '~\b(?:' . $alternation . ')\b~i';
$result = preg_replace($pattern, 'as', $this->code);
So, when you do that, the string is parsed only once and all the words are replaced in one shot.
If you have a lot of words and a very long string:
Testing a long alternation has a significant cost. Even if the pattern starts with \b that highly reduces the possible positions for a match, your pattern will have hard time to succeed and more to fail.
Only in this particular case, you can use this another way:
First you define a placeholder (a character or a small string that can't be in your string, lets say §) that will be inserted in each positions of word boundaries.
$temp = preg_replace('~\b~', '§', $this->code);
Then you change all the keywords like this §word1§, §word2§ ... and you build an associative array where all values are the replacement string:
$trans = [];
foreach ($this->keywords[$this->lang] as $word) {
$trans['§' . $word . '§'] = 'as';
}
Once you have do that you add an empty string with the placeholder as key. You can now use the fast strtr function to perform the replacement:
$trans['§'] = '';
$result = strtr($temp, $trans);
The only limitation of this technic is that it is case-sensitive.
it will work if you keep it like bellow:
$a = preg_replace("/\b".$word."\b/i", "as",$this->code);

Compare words, also need to look for plurals and ing?

I have two list of words, suppose LIST1 and LIST2. I want to compare LIST1 against LIST2 to find the duplicates, but it should find the plural of the word as well as ing form also. For example.
Suppose LIST1 has word "account", and LIST2 has words "accounts,accounting" When i do compare the result should show two match for word "account".
I am doing it in PHP and have the LIST in mysql tables.
You can use a technique called porter stemming to map each list entry to its stem, then compare the stems. An implementation of the Porter Stemming algorithm in PHP can be found here or here.
What I would do is take your word and compare it directly to LIST2 and at the same time remove your word from every word your're comparing looking for a left over ing, s, es to denote a plural or ing word (this should be accurate enough). If not you'll have to generate an algorithm for making plurals out of words as it not as simple as adding an S.
Duplicate Ending List
s
es
ing
LIST1
Gas
Test
LIST2
Gases
Tests
Testing
Now compare List1 to List2. During the same loop of comparison do a direct comparision to items and one where the word, from list 1, is removed from the current word you're looking at in list 2. Now just check is this result is in the Duplicate Ending List.
Hope that makes sense.
The problem with that is, in English at least, plurals are not all standard extensions, nor are present participles. You can make an approximation by using all words +'ing' and +'s', but that will give false positives and negatives.
You can handle it directly in MySQL if you wish.
SELECT DISTINCT l2.word
FROM LIST1 l1, LIST l2
WHERE l1.word = l2.word OR l1.word + 's' = l2.word OR l1.word + 'ing' = l2.word;
This function will output the plural of a word.
http://www.exorithm.com/algorithm/view/pluralize
Something similar could be written for gerunds and present participles (ing forms)
You might consider using the Doctrine Inflector class in conjunction with a stemmer for this.
Here's the algorithm at a high level
Split search string on spaces, process words individually
Lowercase the search word
Strip special characters
Singularize, replace differing portion with wildcard ('%')
Stem, replace differing portion with wildcard ('%')
Here's the function I put together
/**
* Use inflection and stemming to produce a good search string to match subtle
* differences in a MySQL table.
*
* #string $sInputString The string you want to base the search on
* #string $sSearchTable The table you want to search in
* #string $sSearchField The field you want to search
*/
function getMySqlSearchQuery($sInputString, $sSearchTable, $sSearchField)
{
$aInput = explode(' ', strtolower($sInputString));
$aSearch = [];
foreach($aInput as $sInput) {
$sInput = str_replace("'", '', $sInput);
//--------------------
// Inflect
//--------------------
$sInflected = Inflector::singularize($sInput);
// Otherwise replace the part of the inflected string where it differs from the input string
// with a % (wildcard) for the MySQL query
$iPosition = strspn($sInput ^ $sInflected, "\0");
if($iPosition !== null && $iPosition < strlen($sInput)) {
$sInput = substr($sInflected, 0, $iPosition) . '%';
} else {
$sInput = $sInput;
}
//--------------------
// Stem
//--------------------
$sStemmed = stem_english($sInput);
// Otherwise replace the part of the inflected string where it differs from the input string
// with a % (wildcard) for the MySQL query
$iPosition = strspn($sInput ^ $sStemmed, "\0");
if($iPosition !== null && $iPosition < strlen($sInput)) {
$aSearch[] = substr($sStemmed, 0, $iPosition) . '%';
} else {
$aSearch[] = $sInput;
}
}
$sSearch = implode(' ', $aSearch);
return "SELECT * FROM $sSearchTable WHERE LOWER($sSearchField) LIKE '$sSearch';";
}
Which I ran with several test strings
Input String: Mary's Hamburgers
SearchString: SELECT * FROM LIST2 WHERE LOWER(some_field) LIKE 'mary% hamburger%';
Input String: Office Supplies
SearchString: SELECT * FROM LIST2 WHERE LOWER(some_field) LIKE 'offic% suppl%';
Input String: Accounting department
SearchString: SELECT * FROM LIST2 WHERE LOWER(some_field) LIKE 'account% depart%';
Probably not perfect, but it's a good start anyway! Where it will fall down is when multiple matches are returned. There's no logic to determine the best match. That's where things like MySQL fulltext and Lucene come in. Thinking about it a little more, you might be able to use levenshtein to rank multiple results with this approach!

Categories