Regex for random text replacement on input stream

Regex for random text replacement on input stream - php

I have a paragraph of text below that I want to use an input source. And I want the doSpin() function to take the stream of text and pick one value at random, from each [%group of replacement candidates%].
This [%should|ought|would|could%] make
it much [%more
convenient|faster|easier%] and help
reduce duplicate content.
So this sentence, when filtered, could potentially result in any of the following when input...
1) This should make it much more convenient and help reduce duplicate content.
2) This ought make it much faster and help reduce duplicate content.
3) This would make it much easier and help reduce duplicate content.
// So the code stub would be...
$content = file_get_contents('path to my input file');
function doSpin($content)
{
// REGEX MAGIC HERE
return $content;
}
$myNewContent = doSpin($content);
*I know zilch of Regex. But I know what I'm trying to do requires it.
Any ideas?

Use preg_replace_callback():
function doSpin($content) {
return preg_replace_callback('!\[%(.*?)%\]!', 'pick_one', $content);
}
function pick_one($matches) {
$choices = explode('|', $matches[1]);
return array_rand($choices);
}
The way this works is that it searches for [%...%] and captures what's in between. That's passed as $matches[1] to the callback (as it is the first captured group). That group is split on | using explode() and a random one is returned using array_rand(),

Related

Php preg_match doesnt work with variable

I have one array which contains multiple strings. I have another array which contain also strings but they are shorter. My goal is to check is there any partial match in the bigger array for every item from the smaller array. However preg_match doesnt work at all with variables. If I put raw input everything seems fine but otherwise results is false. I have tried almost every possible regex combination but without success. Sample code:
//Lets say $needle is 3333 and bigPatern has 10 records with 10 digits each, for example third record is 5125433331. I want to perform the partial match and get true
$needle = $smlPattern[0]; //debugging with first item from smaller array
$needle2 = "/$needle/"; // I tried [$needle], ^..&, to concatenate and etc
foreach ($bigPatern as $val)
{
if (preg_match($needle2, $val))
{
echo "YES";
}
}
Any tips what Im doing wrong?

Please escape your regex input!
$needle2 = "/".preg_quote($needle,'/')."/"; //
Don't blindly add user input to your regex, much for the same reason you need to escape user input in SQL queries. In regex, the biggest issue is usually the ReDoS problem, where a malicious user can create a specially crafted regex that will use hours, or more, to execute, stealing all the CPU from your server.

Main wrong thing in your example is to use regexp for checking the presence of a string. There is a strpos function for that.
if ( strpos($bigOne, $smallOne) !== false ) {
echo "bigOne contains smallOne";
}

You can even use strpos function to achieve the same purpose. It finds the position of the first occurrence of a substring in a string, and returns false if no match is found.
$needle = $smlPattern[0];
$needle2 = "needle";
foreach ($bigPatern as $val){
if (strpos($val, $needle2) !== false){
echo "YES";
}
}

Pass a closure to an array value

Please forgive me if I'm way off here but I'm trying to create a simple template parser and to do this I'm using regular expressions to find template tags and replace them with dynamic text.
I want to be able to use a closure to return the replacement text. For example:
// Example 1
$tag['\{{title:(.*)}}\'] = function($1)
{
return $1;
};
// Example 2
$tag['\{{title:(.*)}}\'] = function($1)
{
$something = ['One', 'Two', 'Three'];
return $1 . implode($something);
};
// Now do the replacements
foreach($tag as $pattern=>$replacement)
{
preg_replace($pattern, $replacement, $template);
}
I've included example 2 to explain that the result maybe dynamic and this is why I can't simply use strings.
I also feel like I need to explain why I'd need such functionality. The patterns are meant to be expandable, so other developers can add their own patterns easily.
If I've completely off the mark and this isn't going to be achievable, could you point me in the direction to achieve the same/similar functionality?
also, side note - not my main question - but is there a way to do multiple preg_replace in one go instead of looping through, seems inefficient.

Using strpos() to match part of an array, PHP

I need some help with strpos().
Need to build a way to match any URL that contains /apple-touch but also need to keep specifics matching, such as "/favicon.gif" etc
At the moment, the matches are listed out individually as part of an array:
<?php
$errorurl = $_SERVER['REQUEST_URI'];
$blacklist = array("/favicon.gif", "/favicon.png", "/apple-touch-icon-precomposed.png", "/apple-touch-icon.png", "/apple-touch-icon-72x72-precomposed.png", "/apple-touch-icon-72x72.png", "/apple-touch-icon-114x114-precomposed.png", "/apple-touch-icon-114x114.png", "/apple-touch-icon-57x57-precomposed.png", "/apple-touch-icon-57x57.png", "/crossdomain.xml");
if (in_array($errorurl, $blacklist)) { // do nothing }
else { // send an email about error }
?>
Any ideas?
Many thanks for help

Instead of a regex, you could also remove all occurrences of your blacklist items with str_replace and compare the new string to the old one:
if ( str_replace($blacklist, '', $errorurl) !== $errorurl )
{
// do nothing
}
else
{
// send an email about error
}

If you want to use regex for this, and you want a single regex string that will capture all the values in your existing blacklist plus match any apple-touch string, then something like this would do it.
if(preg_match('/^\/(favicon|crossdomain|apple-touch.*)\.(gif|png|xml)$/',$_SERVER['REQUEST_URI']) {
//matched the blacklist!
}
To be honest, though, that's far more complex than you need.
I'd say you'd be better off keeping the specific values like favicon.gif etc in the blacklist array you already have; it'd make it a lot easier when you come to adding more items to the list.
I'd only consider using regex for the apple-touch values, since you want to block any variant of them. But even with that, it would likely be simpler if you used strpos().

How To Get The Unique Name Count With PHP?

Let's say I have text file Data.txt with:
26||jim||1990
31||Tanya||1942
19||Bruce||1612
8||Jim||1994
12||Brian||1988
56||Susan||2201
and it keeps going.
It has many different names in column 2.
Please tell me, how do I get the count of unique names, and how many times each name appears in the file using PHP?
I have tried:
$counts = array_count_values($item[1]);
echo $counts;
after exploding ||, but it does not work.
The result should be like:
jim-2,
tanya-1,
and so on.
Thanks for any help...

Read in each line, explode using the delimiter (in this case ||), and add it to an array if it does not already exist. If it does, increment the count.
I won't write the code for you, but here a few pointers:
fread reads in a line
explode will split the line based on a delimiter
use in_array to check if the name has been found before, and to determine whether you need to add the name to the array or just increment the count.
Edit:
Following Jon's advice, you can make it even easier for you.
Read in line-by-line, explode by delimiter and dump all the names into an array (don't worry about checking if it already exists). After you're done, use array_count_values to get every unique name and its frequency.

Here's my take on this:
Use file to read the data file, producing an array where each element corresponds to a line in the input.
Use array_filter with trim as the filter function to remove blank lines from this array. This takes advantage that trim returns a string having removed whitespace from both ends of its argument, leaving the empty string if the argument was all whitespace to begin with. The empty string converts to boolean false -- thus making array_filter disregard lines that are all whitespace.
Use array_map with a callback that involves calling explode to split each array element (line of text) into three parts and returning the second of these. This will produce an array where each element is just a name.
Use array_map again with strtoupper as the callback to convert all names to uppercase so that "jim" and "JIM" will count as the same in the next step.
Finally, use array_count_values to get the count of occurrences for each name.
Code, taking things slowly:
function extract_name($line) {
// The -1 parameter (available as of PHP 5.1.0) makes explode return all elements
// but the last one. We want to do this so that the element we are interested in
// (the second) is actually the last in the returned array, enabling us to pull it
// out with end(). This might seem strange here, but see below.
$parts = explode('||', $line, -1);
return end($parts);
}
$lines = file('data.txt'); // #1
$lines = array_filter($lines, 'trim'); // #2
$names = array_map('extract_name', $lines); // #3
$names = array_map('strtoupper', $names); // #4
$counts = array_count_values($names); // #5
print_r($counts); // to see the results
There is a reason I chose to do this in steps where each steps involves a function call on the result of the previous step -- that it's actually possible to do it in just one line:
$counts = array_count_values(
array_map(function($line){return strtoupper(end(explode('||', $line, -1)));},
array_filter(file('data.txt'), 'trim')));
print_r($counts);
See it in action.
I should mention that this might not be the "best" way to solve the problem in the sense that if your input file is huge (in the ballpark of a few million lines) this approach will consume a lot of memory because it's reading all the input in memory at once. However, it's certainly convenient and unless you know that the input is going to be that large there's no point in making life harder.
Note: Senior-level PHP developers might have noticed that I 'm violating strict standards here by feeding the result of explode to a function that accepts its argument by reference. That's valid criticism, but in my defense I am trying to keep the code as short as possible. In production it would be indeed better to use $a = explode(...); return $a[1]; although there will be no difference as regards the result.

While I do feel that this website's purpose is to answer questions and not do homework assignments, I don't acknowledge the assumption that you are doing your homework, since that fact has not been provided. I personally learned how to program by example. We all learn our own ways, so here is what I would do if I were to attempt to answer your question as accurately as possible, based on the information you have provided.
<?php
$unique_name_count = 0;
$names = array();
$filename = 'Data.txt';
$pointer = fopen($filename,'r');
$contents = fread($pointer,filesize($filename));
fclose($pointer);
$lines = explode("\n",$contents);
foreach($lines as $line)
{
$split_str = explode('|',$line);
if(isset($split_str[2]))
{
$name = strtolower($split_str[2]);
if(!in_array($name,$names))
{
$names[] = $name;
$unique_name_count++;
}
}
}
echo $unique_name_count.' unique name'.(count($unique_name_count) == 1 ? '' : 's').' found in '.$filename."\n";
?>

How search for thousands of possible keywords in a string

I have a database of thousands (about 10,000) keywords. When a user posts a blog on my site, I would like to automatically search for the keywords in the text, and tag the post with any direct matches.
So far, all I can think of is to pull the ENTIRE list of keywords, loop through it, and check for the presence of each tag in the post...which seems very inefficient (that's 10,000 loops).
Is there a more common way to do this? Should I maybe use a MySQL query to limit it down?
I imagine this is not a totally rare task.

No, just don't do that.
Instead of looping through 10000 elements, it is better to extract the words from the sentence or text, then add it to the SQL query and that way you will have all the needed records. This is surely more efficient than the solution you proposed.
You can do this in the following way using PHP:
$possible_keywords = preg_split('/\b/', $your_text, PREG_SPLIT_NO_EMPTY);
The above will split the text on the words' boundaries and will return no empty elements in the array.
Then you just can create the SQL query in a fashion similar to the following:
SELECT * FROM `keywords` WHERE `keywords`.`keyword` IN (...)
(just put the comma-separated list of extracted words in the bracket)
You should probably filter the $possible_keywords array before making the query (to include only the keywords with appropriate length and to exclude duplicates) plus make keyword column indexed.

I don't know what language you intend on using, but a standard trie (prefix tree) would solve this issue, if you were feeling up to it.

I guess you could build a regular expression dynamically which will enable you to match keywords inside a specific string. You can package all this in a class which does the grunt work.
class KeywordTagger {
static function getTags($body) {
if(preg_match_all(self::getRegex(), $body, $keywords)) {
return $keywords[0];
} else {
return null;
}
}
private static $regex;
private static function getRegex() {
if(self::$regex === null) {
// Load Keywords from DB here
$keywords = KeywordsTable::getAllKeywords();
// Let's escape
$keywords = array_map('KeywordTagger::pregQuoteWords', $keywords);
// Base Regex
$regex = '/\b(?:%s)\b/ui';
// Build Final
self::$regex = sprintf($regex, implode('|', $keywords));
}
return self::$regex;
}
private static function pregQuoteWords($word) {
return preg_quote($word, '/');
}
}
Then, all you have to do is, when a user writes a post, run it through the class:
$tags = KeywordTagger::getTags($_POST['messageBody']);
For a small speed up, you could cache the built regex using memcached, APC or a good-old file-based cache.

Well, I think that PHP's stripos is already quite optimized. If you want to optimize this search further, you would have to take advantage of similarities between your keywords (e.g. instead of looking for "foobar" and then for "foobaz", look for "fooba" and then check for each "fooba" if it's followed by a 'r', a 'z', or none). But this would require some sort of tree-representation of your keywords, like:
root (empty string)
|
fooba
/ \
foobar foobaz
Yes, that's a trie.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regex for random text replacement on input stream - php

Related

Php preg_match doesnt work with variable

Pass a closure to an array value

Using strpos() to match part of an array, PHP

How To Get The Unique Name Count With PHP?

How search for thousands of possible keywords in a string

Categories

Resources