I have a query that I'm trying to run against a database of words. The query must return a word that contains another word, and letters given. It is in PHP and mySQL.
For example:
Word Given: Cruel
Letters Given: abcdty
In the database, I need to find the word "Cruelty" based on the letters given, and the word given. It needs to works both ways. So if I had "atni" for letters, "Anticruel" would appear if it existed in the database.
I have it half working but the result given is not correct:
SELECT word
FROM words
WHERE LOCATE( "cruel", word ) >0
AND word != "cruel"
AND word
REGEXP '[ybilteh]'
The result set from this query:
"anticruelty"
"crueler"
"cruelest"
"crueller"
"cruellest"
"cruelly"
"cruelness"
"cruelnesses"
"cruelties"
"cruelty"
Update!!!
Thanks to Benjamin Morel, this is getting much closer.
This query:
SELECT word
FROM words
WHERE LOCATE( "t", word ) >0
AND word != "t"
AND word
REGEXP '^[ybilteh]*t[ybilteh]*$'
LIMIT 0 , 30
Finds words correctly. But also includes words with double letters. Such as "Beet". When only 1 "e" is available.
Try this one:
SELECT word
FROM words
WHERE word REGEXP '^[ybilteh]*cruel[ybilteh]*$'
AND word != 'cruel';
UPDATE: let's go refining with PHP, what about this?
$word = 'cruel';
$letters = 'ybilteh';
$items = array("anticruelty", "crueler", "cruelest",
"crueller", "cruellest", "cruelly", "cruelness",
"cruelnesses", "cruelties", "cruelty");
$letters = str_split($letters);
foreach ($items as $item) {
$list = $letters;
// remove the original word (once)
$thisItem = preg_replace("/$word/", '', $item, 1);
for ($i=0; $i<strlen($thisItem); $i++) {
$index = array_search($thisItem[$i], $list);
if ($index === false) {
continue 2; // letter not available
}
unset($list[$index]); // remove the letter from the list
}
echo "$item\n"; // passed!
}
Returns: cruelly, cruelty
You might probably find a better/simpler approach, but that should do the trick!
Related
I have a list of allowed letters
$allowedLetters = array('B','C','D','F','G','H','J','K','L','M','N','P','R','S','T','V','W','X','Y','Z');
And from that array I would like to do string increment to get the following pattern:
BBB, BBC, BBD ... until ZZZ
I know that I can do string increment as simple as this:
$letters = array();
$letter = 'BBB';
while ($letter !== 'ZZZ') {
$letters[] = $letter++;
}
print_r($letters);
But it will not match my allowed letters list, and I just can not find a way how to either do an increment using allowed list or just exclude letters that I do not want such as:
A,E,I,O,Q,U
What could be more simple? I would appreciate if anyone could assist.
I propose a solution starting from your code that involves strcspn() function:
$letters = array();
$letter = 'BBB';
while ($letter !== 'ZZZ') {
$letter++;
if(strcspn($letter, "AEIOU") == 3 )
$letters[] = $letter;
}
print_r($letters);
The mentioned function returns the index of the first occurrence of the characters listed in needle parameter. So, in our case, it will return a value in the range [0-2] if any of the characters is present. According to the manual page, no one of the characters specified in needle list is found, the length of the original string is returned (in our scenario it is always equal to 3).
This means that making sure that it returns 3 we are accepting only strings that don't contain the forbidden characters "AEIOU", appending them to our output array.
I am trying to get a count of common phrases from a body of text. I don't just want single words, but rather all series of words between any stop words. So for example, https://en.wikipedia.org/wiki/Wuthering_Heights I would like the phrase "wuthering heights" to be counted rather than "wuthering" and "heights".
if (in_array($word, $this->stopwords))
{
$cleanPhrase = preg_replace("/[^A-Za-z ]/", '', $currentPhrase);
$cleanPhrase = trim($cleanPhrase);
if($cleanPhrase != "" && strlen($cleanPhrase) > 2)
{
$this->Phrases[$cleanPhrase] = substr_count($normalisedText, $cleanPhrase);
$currentPhrase = "";
}
continue;
}
else
$currentPhrase = $currentPhrase . $word . " ";
The problem I have with this "age" is being counted if the word "stage" is being used. The solution here is to add whitespace to either side of the $cleanPhrase variable. The problem this leads to then is if there is no white space. There could be a comma, full stop or some other character that signals some kind of punctuation. I want to count all of these. Is there a way I can do this without having to do something like this.
$terminate = array(".", " ", ",", "!", "?");
$count = 0;
foreach($terminate as $tpun)
{
$count += substr_count($normalisedText, $tpun . $cleanPhrase . $tpun);
}
By utilizing this answer with slight modification, you can do this:
$sentence = "Age: In this day and age, people of all age are on the stage.";
$word = 'age';
preg_match_all('/\b'.$word.'\b/i', $sentence, $matches);
\b represents a word boundary. So that string will give a count of 3 if searching for age (the i flag in the pattern means case insensitive, you can remove it if you want to match case as well).
If you're only going to match on one phrase at a time, you'll find your count in count($matches[0]).
I have a task to count sentences without using str_word_count, my senior gave it to me but I am not able to understand. Can someone explain it?
I need to understand the variable and how it works.
<?php
$sentences = "this book are bigger than encyclopedia";
function countSentences($sentences) {
$y = "";
$numberOfSentences = 0;
$index = 0;
while($sentences != $y) {
$y .= $sentences[$index];
if ($sentences[$index] == " ") {
$numberOfSentences++;
}
$index++;
}
$numberOfSentences++;
return $numberOfSentences;
}
echo countSentences($sentences);
?>
The output is
6
It's something very trivial, I'd say.
The task is to count words in a sentence. A sentence is an string (a sequence of characters) that are letters or white spaces (space, new line, etc.)...
Now, what's a word of the sentence? It is a distinct group of letters that "don't touch" other group of letters; meaning words (group of letters) are separated from each other with white space (let's say just a normal blank space)
So the simplest algorithm to count words consist in:
- $words_count_variable = 0
- go through all the characters, one-by-one
- each time you find a space, it means a new word just ended before that, and you have to increase your $words_count_variable
- lastly, you'll find the end of the string, and that means a word just ended before that, so you'll increase for the last time your $words_count_variable
Take "this is a sentence".
We set $words_count_variable = 0;
Your while cycle will analyze:
"t"
"h"
"i"
"s"
" " -> blank space: a word just ended -> $words_count_variable++ (becomes 1)
"i"
"s"
" " -> blank space: a word just ended -> $words_count_variable++ (becomes 2)
"a"
" " -> blank space: a word just ended -> $words_count_variable++ (becomes 3)
"s"
"e"
"n"
...
"n"
"c"
"e"
-> end reached: a word just ended -> $words_count_variable++ (becomes 4)
So, 4.
4 words counted.
Hope this was helpful.
Basicaly, it is just counting the number of space in a sentence.
<?php
$sentences = "this book are bigger than encyclopedia";
function countSentences($sentences) {
$y = ""; // Temporary variable used to reach all chars in $sentences during the loop
$numberOfSentences = 0; // Counter of words
$index = 0; // Array index used for $sentences
// Reach all chars from $sentences (char by char)
while($sentences != $y) {
$y .= $sentences[$index]; // Adding the current char in $y
// If current char is a space, we increase the counter of word
if ($sentences[$index] == " "){
$numberOfSentences++;
}
$index++; // Increment the index used with $sentences in order to reach the next char in the next loop round
}
$numberOfSentences++; // Additional incrementation to count the last word
return $numberOfSentences;
}
echo countSentences($sentences);
?>
Be aware that this function will have wrong results on several case, for example if you have two spaces following, this function will count 2 words instead of one.
In PHP, how would one take a string and separate the words into an array? Is there a way to split them by the spaces in the string? Also, how would you check if a word in an array starts with a character? The idea would be to have a textbox with come normal words and a hashtag. The PHP script would maybe underline or change the color of the hashtag when you submit the form. It would basically underline any word that started with a hash. Sorry if this doesn't make any sense.
To convert a string to an array, use PHP's explode function:
$array = explode(' ', $string);
To examine whether each word begins with a certain character, you can use strpos in a loop:
// Pick a letter
$char = 'a';
foreach ($array as $word) {
if (strpos($word, $char) === 0) {
// Echo out what you want
echo "$word contains $char\r\n";
// Halt loop if necessary:
break;
}
}
Objective: Search through an array of tens of thousands of Chinese sentences in order to locate sentences that exclusively contain characters from a "known characters" array.
For example:
Let's say my corpus consists of the following sentences: 1) 我去中国。 2) 妳爱他。 3) 你在哪里?
I only "know" or want sentences that exclusively contain these characters: 1) 我 2) 中 3) 国 4) 你 5) 在 6) 去 7) 爱 8) 哪 9) 里.
The first sentence would be returned as result because all three of its characters are in my second array. The second sentence would be rejected because I did not ask for 妳 or 他. The third sentence would be returned as a result. The punctuation marks are ignored (as well as any alpha-numeric characters).
I have a working script that does this (below). I'm wondering if this is an efficient way or not. If you are interested, please take a look and suggest changes, write your own, or give some advice. I've gleaned some from this script and checked out some stackoverflow questions, but they didn't address this scenario.
<?php
$known_characters = parse_file("FILENAME") // retrieves target characters
$sentences = parse_csv("FILENAME"); // retrieves the text corpus
$number_wanted = 30; // number of sentences to attempt to retrieve
$found = array(); // stores results
$number_found = 0; // number of results
$character_known = false; // assume character is not known
$sentence_known = true; // assume sentence matches target characters
foreach ($sentences as $s) {
// retrieves an array of the sentence
$sentence_characters = mb_str_split($s->ttext);
foreach ($sentence_characters as $sc) {
// check to see if the character is alpha-numeric or punctuation
// if so, then ignore.
$pattern = '/[a-zA-Z0-9\s\x{3000}-\x{303F}\x{FF00}-\x{FF5A}]/u';
if (!preg_match($pattern, $sc)) {
foreach ($known_characters as $kc) {;
if ($sc==$kc) {
// if character is known, move to next character
$character_known = true;
break;
}
}
} else {
// character is known if it is alpha-numeric or punctuation
$character_known = true;
}
if (!$character_known) {
// if character is unknown, move to next sentence
$sentence_known = false;
break;
}
$character_known = false; // reset for next iteration
}
if ($sentence_known) {
// if sentence is known, add it to results array
$found[] = $s->ttext;
$number_found = $number_found+1;
}
if ($number_found==$number_wanted)
break; // if required number of results are found, break
$sentence_known = true; // reset for next iteration
}
?>
It appears to me this should do it:
$pattern = '/[^a-zA-Z0-9\s\x{3000}-\x{303F}\x{FF00}-\x{FF5A}我中国你在去爱哪里]/u';
if (preg_match($pattern, $sentence) {
// the sentence contains characters besides a-zA-Z0-9, punctuation
// and the selected characters
} else {
// the sentence contains only the allowed characters
}
Make sure to save your source code file in UTF-8.