PHP Create List with Replaced characters

PHP Create List with Replaced characters - php

I am trying to basically create a list of obscene words for a filter. I wanted to generate the list by creating arrays of characters and their replacements, example. 'A' can be replaced with '4', or "E" with "3".
So basically I have a bunch of arrays for every char in the alphabet with the several different ways to replace it. EG. $e = array( "e" =>"3"); I have an array of obscene words. I need to print all the obscene words and then their variants where the letters match. Example:
Hello
He11o
He1lo
H3llo
H31lo
H311o
Every variation. How would I go about doing this? Any help would be much appreciated.

Sounds like a job for regular expressions for me.
preg_replace('/H[e3][l1]{2}[0o]/i','H****',$textstr);

Related

How would I replace a word in a string that I know the start and ending to, but not the entire word? Ie: Converting an ID# to a name

I am creating a web interface for a Discord bot I have created. I currently store all user accounts, messages, etc in a SQL database so that the web interface can have extensive logs for the mods to use. I am currently trying to come up with a solution for when viewing messages to convert "Discord Mentions" to readable names.
For example, when someone tags/mentions another user in a message, instead of the SQL storing '#name' it stores '<#!12345678>'. Based on how that text starts with <#! I know that it's linking a user name, in which I can access the SQL table containing all the users to retrieve their plain text name, but I'm not sure how to:
A) Specifically grab any words that both start with <#! and end with > to be able to grab the ID for a query and
B) Replace the the above <#!12345etc>, which is easy enough to do once I know how to do A.
Just for clarification I'm not looking for help doing SQL query, just looking for help in getting the entire word that stats with <#! and ends with > from a string/paragraph.
I'm terrible with regex so hopefully there is a solution that can work without needing it haha. Any tips you could provide would be greatly appreciated.
TLDR:
Sample string:
"Hey <#!123456789> thanks for that, I'll get back to you sooon."
How to get the grab the entire word that starts with <#! and ends with > to be able to do SQL query with it and then a replace() later.
I thought about exploding the string with a space and then going through each word one at a time checking each word with startswith and endswith but if the message author didn't leave a space between mentions and the rest of the text that wouldn't work.

If I'm understanding this correctly you want all the values between "<#!" and ">". That being said I believe all you need is this /<#!(.+)>/g
demo

You can do it this way:
<?php
$str = "Hey <#!123456789> thanks for that, I'll get back to you sooon.";
$re = '/(?<=<#!).+?(?=>)/m';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// flatten the array result (otherwise it's an array of arrays)
$matches = array_merge(...$matches);
// Print the entire match result
print_r($matches); //Array ( [0] => 123456789 )
Demo https://3v4l.org/D64Ma
Regex explanation:
(?<=<#!) looksbehind to find <#!. This starts the match
.+? matches any character any number of times until next lookahead.
(?=>) match ends when > is found (but not included in match)
The difference between using lookaheads and lookbehinds and regular /<#!(.+?)> is the matches array that they produce.
Lookarounds are not included in the matching group and results in an array of arrays containing all the matching groups ("12345678") only.
Not wrapping the start <#! and end > in a lookaround results in an array of arrays containing both the regex pattern match ("<#!12345678>", plus matching group ("12345678"). So you would have to extract the matching groups from the resulting arrays.

Search a String for Alpha Numeric Characters in a Pattern

I have a string that contains 5 words. In the string one of the words is a Ham Radio Call Sign and can be anyone of the thousands of call signs in the US. In order to extract the Call Sign from the string I need to utilize the below pattern. The Call Sign I need to extract can be in any of the 5 positions in the string. The number is never the first character and the number is never the last character. The string is actually put together from an Array since it is originally read from a text file.
$string = $word[1] $word[2] $word[3] etc....
So the search can be either done on the whole string or each piece of the array.
Patterns:
1 Number and 3 Letters Example: AB4C A4BC
1 Number and 4 Letters Example: A4BCD
1 Number and 5 Letters Example: AB4CDE
I have tried everything I can think of and search till I cant search no more. I am sure I am over thinking this.

A two-step regular expression like this would do it:
$str = "hello A4AB there BC5AD";
$signs = array();
preg_match_all('/[A-Z][A-Z\d]{1,3}[A-Z]/', $str, $possible_signs);
foreach($possible_signs[0] as $possible_sign)
if (preg_match('/^\D+\d\D+$/', $possible_sign))
array_push($signs, $possible_sign);
print_r($signs); //Array ([0] => A4AB [1] => BC5AD)
Explanation
This is a regular expression approach, using two patterns. I don't think it could be done with one and still satisfy the exact requirements of the matching rules.
The first pattern enforces the following requirements:
substring starts and ends with a capital letter
substring contains only other capital letters or numbers between the first and last letter
substring is, overall, not more than 6 characters long
What I can't do in that same pattern, for complex REGEX reasons I won't go into (unless someone knows a way and can correct me), is enforce that only one number is contained.
#jeroen's answer does enforce this in a single pattern, but in turn does not enforce the correct length of the substring. Either way, we need a second pattern.
So after grabbing the initial matches, we loop over the results. We then apply each to a second pattern that enforces simply that there is only one number in the substring.
If so, we green-light the substring and it's added to the $signs array.
Hope this helps.

It depends on what the other words can contain, but you could use a regular expression like:
#\b[a-z]+\d[a-z]+\b#i
^ case insensitive
^^ a word boundary
^^^^^^ One or more letters
^^ One number
You can make it more restrictive by using {1,3} instead of + for the letters so that you have a sequence of 1 to 3 letters.
The complete expression would be something like:
$success = preg_match('#\b[a-z]+\d[a-z]+\b#i', $input_string, $matches);
where $matches[0] will contain the matched value, see the manual.

What is the best way to unscramble words with PHP?

I have a word list and I want to unscramble words using this word list, in PHP.
It seems to me that PHP doesn't have a built-in function that does this. So could someone please suggest a good algorithm to do this, or at least point me in the right direction?
EDIT: edited to add example
So basically, what I'm talking about is I have a list of words:
apple
banana
orange
Then, I'm given a bunch of jumbled letters.
pplea
nanaba
eroang

Given a dictionary of known words:
foreach ($list as $word)
{
if (count_chars($scrambled_word,1) == count_chars($word,1))
echo "$word\n";
}
Edit: A simple optimization would be to move the count_chars($scrambled_word,1)) outside the loop since it never changes:
$letters = count_chars($scrambled_word,1)
foreach ($list as $word)
{
if ($letters == count_chars($word,1))
echo "$word\n";
}

Warning: I rarely use PHP, so this is dealing only with a general algorithm that should work in almost any language, not anything specific to PHP.
Presumably you have a word in which the letters have been rearranged, and you want to find what word(s) could be made from those letters.
If that's correct, the general idea is fairly simple: take a copy of your word list, and sort the letters in each word into alphabetical order. Put the sorted and unsorted versions of each word side by side, and sort the whole thing by the sorted words (but keeping each unsorted word along with its sorted version). You may want to collapse duplicates together, so that (for example) instead of {abt : bat} and {abt : tab}, you have: {abt: bat, tab}
Then, to match up a scrambled word, sort its letters in alphabetical order. Look for matches in your dictionary (since it's sorted, you can use a binary search). When you find a match, the result is the word (or words) associated with that sorted letter group. Using the example above, if the scrambled word was "tba", you'd sort it to get "abt", then look up "abt" to get "bat" and "tab".
Edit: As #Moron pointed out in the comments, sorting and binary search aren't really crucial points in themselves. The basic points are to turn all equivalent inputs into identical keys, then use some sort of fast lookup by key to find the word(s) for that key.
Sorting the letters in each word is one easy way to turn equivalent inputs into identical keys. Sorting the list and doing a binary search is one easy way to do fast lookups by key.
In both cases, there are quite a few alternatives. I'm not at all sure the alternatives are likely to improve performance a lot, but they certainly could.
Just for example, instead of a pure binary search you could have a second level of index that told you where the keys starting with 'a' were, the keys starting with 'b', and so on. Given that a couple of extremely frequently-used letters are near the beginning of the alphabet ('e' and 'a', for example) you might be better off sorting the words so that relatively uncommon letters ('q', 'z', etc.) are toward the front of the key, and the most commonly used letters are at the end. This would give that first lookup based on the initial character the greatest discrimination.
On the sort/binary search side, there are probably more alternatives, and probably better arguments to be made in favor of using something else. Hash tables typically allow lookups in (nearly) constant time. Tries can reduce storage substantially, especially when many words share a common prefix. The only obvious disadvantage is that the code for either one is probably more work (though PHP's array type is hash-based, so you could probably use it quite nicely).

It is possible to unscramble in O(log p + n) where
p = size of dictionary
n = length of word to be unscrambled
Assume a constant, c, the most occurrences of some letter within any word plus 1.
Assume a constant, k, the number of letters in the alphabet.
Assume a constant, j, the most number of words that can share the same hash or letter-sorted version.
Initialization of O(p) space:
1. Using the dictionary, D, create an associated list of letter sorted words, L, which will be size at most p since each word has a one sorted version.
2. Associate another column to L with a numerical hash of integers which can range [0, c^k-1].
3. For each word in L, generate its hash with the following function:
hash(word) = 0 if word is empty or (c^i + hash(remaining substring of the word))
where i is the zero-based alphabet index of the first letter.
Algorithm:
1. In O(n), determine hash, h, of the letter sorted version of the word in question.
2. In O(log p), search for the hash in L.
3. In O(n), list j associated words of length n.

Try these
http://www.php.net/manual/en/function.similar-text.php
http://www.php.net/manual/en/function.soundex.php
http://www.php.net/manual/en/function.levenshtein.php

The slow option would be to generate all permutations of the letters in a scrambled word, then probe them via pspell_check().
If you however can use a raw dictionary text file, then the best option is to just use a regular expression to scan it:
$dict = file_get_contents("words.txt"); // one word per line
$n = strlen($word);
if (preg_match('/^[$word]{$n}$/im', $dict, $match)) {
print $match[0];
}
I'm quite certain PCRE is significantly faster at searching for permutations than PHP and the guessing approach.

Make use of PHP's array functions since they can solve this for you.
$words = array('hello', 'food', 'stuff', 'happy', 'fast');
$scrambled_word = 'oehll';
foreach ($words as $word)
{
// Same length?
if (strlen($scrambled_word) === strlen($word))
{
// Convert to an array and match
if( ! array_diff(str_split($word), str_split($scrambled_word)))
{
print "Your word is: $word";
}
}
}
Basically, you look for something the same length - then you ask PHP to see if all the letters are the same.

If you have a really large list of words and want this unscramble operation to be fast, I'd try putting the word list into a database. Next add a field to your word list table that is the sum of the ascii values of the word, and then add an index on this ascii sum.
Whenever you want to retrieve a list of possible matches just search the word table for ascii sums that match the sum of the scrambled letters. Keep in mind that you may have a few false matches so you'll have to compare all of the matched words to ensure they contain only the letters of your scrambled word (but the result set should be pretty small).
If you don't want to use a database you could implement the same basic idea using a file, just sort the list by the sum value for faster retrieval of all matches.
Example Data assumes all lowercase (a=97, b=98, c=99, ...)
bat => 311,
cat => 312, ...
Example php function to figure out the sum for a word
function asciiSum($word) {
$characters = str_split(strtolower($word));
$sum = 0;
foreach($characters as $character) {
$sum += ord($character);
}
return $sum;
}
Even faster: add another field to the database that represents the string length, then you can search for words based on an ascii sum and a string length which would further reduce the number of false matches you would need to check for.

Any faster, simpler alternative to php preg_match

I am using cakephp 1.3 and I have textarea where users submit articles. On submit, I want to look into the article for certain key words and and add respective tags to the article.
I was thinking of preg_match, But preg_match pattern has to be string. So I would have to loop through an array(big).
Is there a easier way to plug in the keywords array for the pattern.
I appreciate all your help.
Thanks.

I suggest treating your array of keywords like a hash table. Lowercase the article text, explode by spaces, then loop through each word of the exploded array. If the word exists in your hash table, push it to a new array while keeping track of the number of times it's been seen.
I ran a quick benchmark comparing regex to hash tables in this scenario. To run it with regex 1000 times, it took 17 seconds. To run it with a hash table 1000 times, it took 0.4 seconds. It should be an O(n+m) process.
$keywords = array("computer", "dog", "sandwich");
$article = "This is a test using your computer when your dog is being a dog";
$arr = explode(" ", strtolower($article));
$tracker = array();
foreach($arr as $word){
if(in_array($word, $keywords)){
if(isset($tracker[$word]))
$tracker[$word]++;
else
$tracker[$word] = 1;
}
}
The $tracker array would output: "computer" => 1, "dog" => 2. You can then do the process to decide what tags to use. Or if you don't care about the number of times the keyword appears, you can skip the tracker part and add the tags as the keywords appear.
EDIT: The keyword array may need to be an inverted index array to ensure the fastest lookup. I am not sure how in_array() works, but if it searches, then this isn't as fast as it should be. An inverted index array would look like
array("computer" => 1, "dog" => 1, "sandwich" => 1); // "1" can be any value
Then you would do isset($keywords[$word]) to check if the word matches a keyword, instead of in_array(), which should give you O(1). Someone else may be able to clarify this for me though.

If you don't need the power of regular expressions, you should just use strpos().
You will still need to loop through the array of words, but strpos is much, much faster than preg_match.

Of course, you could try matching all the keywords using one single regexp, like /word1|word2|word3/, but I'm not sure it is what you are looking for. And also I think it would be quite heavy and resource-consuming.
Instead, you can try with a different approach, such as splitting the text into words and checking if the words are interesting or not. I would make use of str_word_count() using someting like:
$text = 'this is my string containing some words, some of the words in this string are duplicated, some others are not.';
$words_freq = array_count_values(str_word_count($text, 1));
that splits the text into words and counts occurrences. Then you can check with in_array($keyword, $words_freq) or array_intersect(array_keys($words_freq), $my_keywords).
If you are not interested, as I guess, to the keywords case, you can strtolower() the whole text before proceeding with words splitting.
Of course, the only way to determine which approach is the best is to setup some testing, by running various search functions against some "representative" and quite long text and measuring the execution time and resource usage (try microtime(TRUE) and memory_get_peak_usage() to benchmark this).
EDIT: I cleaned up a bit the code and added a missing semi-colon :)

If you want to look for multiple words from an array, then combine said array into an regular expression:
$regex_array = implode("|", array_map("preg_escape", $array));
preg_match_all("/($regex_array)/", $src, $tags);
This converts your array into /(word|word|word|word|word|...)/. The arrray_map and preg_escape part is optional, only needed if the $array might contain special characters.
Avoid strpos and loops for this case. preg_match is faster for searching after alternatives.

strtr()
If given two arguments, the second
should be an array in the form
array('from' => 'to', ...). The return
value is a string where all the
occurrences of the array keys have
been replaced by the corresponding
values. The longest keys will be tried
first. Once a substring has been
replaced, its new value will not be
searched again.

Add tags manually? Just like we add tags here at SO.

Trying to split a string in 3 variables, but a little more tricky - PHP

Having pretty much covered the basics in PHP, I decided to challenge myself and make a simple calculator. After some attempts I figured it out, but I'm not entirely content with it. I want to make it more user friendly and have the calculator input in just one box, very much like google search.
So one would simply type: 5+2
and recieve 7.
How would I split the string "5+2" into three variables so that the math functions can convert the numbers into integers and recognize the operator, as well as accounting for the possibility of someone using spaces between the values as well?
Would you explode the string? But what would you explode it with if there are no spaces?
I've also stumbled upon the preg_split function, but I can't seem to wrap my head around or know if it's suitable to solve this problemt. What method would be the best option for this?

$calc = "5* 2+ 53";
$calc = preg_replace('/(\s*)/','',$calc);
print_r(preg_split('/([\x28-\x2B\x2D\x2F])/',$calc,-1,PREG_SPLIT_DELIM_CAPTURE));
That's my bid, resulting in
Array
(
[0] => 5
[1] => *
[2] => 2
[3] => +
[4] => 53
)

You may need to use some clever regex to split it something like:
$myOutput = split("(-?[0-9]+)|([+-*/]{1})|(-?[0-9]+)");
I haven't tested that - just an semi-psuedo-ish example sorry :-> just trying to highlight that you will need to remember that your - (minus) operator can appear at the start of an integer to make it a negative number so you could end up with problems with things like -1--21 which is valid but makes your regex rules more complicated.

You will have to split the string using regular expressions.
For example a simple regex for 5+2 would be:
\d\+\d
Check out this link. You can create and validate your regular expressions there. For a calculator it will not be that difficult.

You've got the right idea with preg_split. It would work something like this:
$values = preg_split("/[\s]+/", "76 + 23");
The resulting array will contain values that are NOT whitespace:
Values should look like this:
$values[0]: "76"
$values[1]: "+"
$values[2]: "23"
the "/[\s]+/" is a regular expression pattern that matches any whitespace characters one or more times. Howver, if there are no whitespaces at all, preg_split will just return the original "5+2" as a single string in the first element of the array. i.e.:
$values[0] = "5+2"

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.