I'm trying to develop a function that can sort through a string that looks like this:
Donny went to the {park|store|{beach with friends|beach alone}} so he could get a breath of fresh air.
What I intend to do is search the text recursively for {} patterns where there is no { or } inside the {}, so only the innermost sandwiched text is selected, where I will then run a php to array the contents and select one at random, repeating process until the whole string has been parsed, showing a complete sentence.
I just cannot wrap my head around regular expressions though.
Appreciate any help!
Don't know about maths theory behind this ;-/ but in practice that's quite easy. Try
$text = "Donny went to the {park|store|{beach with friends|beach alone}} so he could get a breath of fresh air. ";
function rnd($matches) {
$words = explode('|', $matches[1]);
return $words[rand() % count($words)];
}
do {
$text = preg_replace_callback('~{([^{}]+)}~', 'rnd', $text, -1, $count);
} while($count > 0);
echo $text;
Regexes are not capable of counting and therefore cannot find matching brackets reliably.
What you need is a grammar.
See this related question.
$str="Donny went to the {park|store|{beach {with friends}|beach alone}} so he could get a breath of fresh air. ";
$s = explode("}",$str);
foreach($s as $v){
if(strpos($v,"{")!==FALSE){
$t=explode("{",$v);
print end($t)."\n";
}
}
output
$ php test.php
with friends
Regular expressions don't deal well with recursive stuff, but PHP does:
$str = 'Donny went to the {park|store|{beach with friends|beach alone}} so he could get a breath of fresh air.';
echo parse_string($str), "\n";
function parse_string($string) {
if ( preg_match('/\{([^{}]+)\}/', $string, $matches) ) {
$inner_elements = explode('|', $matches[1]);
$random_element = $inner_elements[array_rand($inner_elements)];
$string = str_replace($matches[0], $random_element, $string);
$string = parse_string($string);
}
return $string;
}
You could do this with a lexer/parser. I don't know of any options in PHP (but since there are XML parsers in PHP, there are no doubt generic parsers). On the other hand, what you're asking to do is not too complicated. Using strings in PHP (substring, etc.) you could probably do this in a few recursive functions.
You will then finally have created a MadLibz generator in PHP with a simple grammar. Pretty cool.
Related
I am using php 5 to parse a string. My input string looks like the following:
{Billion is|Millions are|Trillion is} {an extremely |a| a generously |
a very} { tiny|little |smallish |short |small} stage in a vast
{galactic| |large|huge|tense|big |cosmic}
{universe|Colosseum|planet|arena}.
Find below my minimum viable example:
<?php
function process($text)
{
return preg_replace_callback('/\[(((?>[^\[\]]+)|(?R))*)\]/x', array(
$this,
'replace'
), $text);
}
function replace($text)
{
$text = $this->process($text[1]);
$parts = explode('|', $text);
return $parts[array_rand($parts)];
}
$text = "{Billion is|Millions are|Trillion is} {an extremely |a| a generously | a very} { tiny|little |smallish |short |small} stage in a vast {galactic| |large|huge|tense|big |cosmic} {universe|Colosseum|planet|arena}.";
$res = process($text);
echo $res;
As you can see I am trying to parse the following pattern f.ex.: {Billion is|Millions are|Trillion is} using the above regex, /\[(((?>[^\[\]]+)|(?R))*)\]/x.
As a result I am getting the same string as inputted. I would like to get as an output for example:
Billion is a very little stage in a vast huge arena.
Any suggestions what I am doing wrong?
How would your current code generate anything.
Your regex doesn't fit. It matches nested bracketed stuff and not braced. Try{([^}]*)} for capturing everything inside {...} to $m[1] if there are no nested braces.
Read about preg_replace_callback(). The second argument can not be an array.
A working code with some further adjustments could look like this:
function process($text) {
return preg_replace_callback('/{([^}]*)}/', 'replace', $text);
}
function replace($m) {
$parts = explode('|', $m[1]);
shuffle($parts);
return $parts[0];
}
$text = "{Billion is|Millions are|Trillion is} {an extremely|a|a generously|a very} {tiny|little|smallish|short|small} stage in a vast {galactic||large|huge|tense|big|cosmic} {universe|Colosseum|planet|arena}.";
echo process($text);
Billion is a generously short stage in a vast Colosseum.
Here is a demo at eval.in
(you can also use an anonymous function if PHP >= 5.3)
I'm struggling to find the best way to do this. Basically I am provided strings that are like this with the task of printing out the string with the math parsed.
Jack has a [0.8*100]% chance of passing the test. Katie has a [(0.25 + 0.1)*100]% chance.
The mathematical equations are always encapsulated by square brackets. Why I'm dealing with strings like this is a long story, but I'd really appreciate the help!
There are plenty of math evaluation libraries for PHP. A quick web search turns up this one.
Writing your own parser is also an option, and if it's just basic arithmetic it shouldn't be too difficult. With the resources out there, I'd stay away from this.
You could take a simpler approach and use eval. Be careful to sanitize your input first. On the eval docs's page, there are comments with code to do that. Here's one example:
Disclaimer: I know eval is just a misspelling of evil, and it's a horrible horrible thing, and all that. If used right, it has uses, though.
<?php
$test = '2+3*pi';
// Remove whitespaces
$test = preg_replace('/\s+/', '', $test);
$number = '(?:\d+(?:[,.]\d+)?|pi|π)'; // What is a number
$functions = '(?:sinh?|cosh?|tanh?|abs|acosh?|asinh?|atanh?|exp|log10|deg2rad|rad2deg|sqrt|ceil|floor|round)'; // Allowed PHP functions
$operators = '[+\/*\^%-]'; // Allowed math operators
$regexp = '/^(('.$number.'|'.$functions.'\s*\((?1)+\)|\((?1)+\))(?:'.$operators.'(?2))?)+$/'; // Final regexp, heavily using recursive patterns
if (preg_match($regexp, $q))
{
$test = preg_replace('!pi|π!', 'pi()', $test); // Replace pi with pi function
eval('$result = '.$test.';');
}
else
{
$result = false;
}
?>
preg_match_all('/\[(.*?)\]/', $string, $out);
foreach ($out[1] as $k => $v)
{
eval("\$result = $v;");
$string = str_replace($out[0][$k], $result, $string);
}
This code is highly dangerous if the strings are user inputs because it allows any arbitrary code to be executed
The eval approach updated from PHP doc examples.
<?php
function calc($equation)
{
// Remove whitespaces
$equation = preg_replace('/\s+/', '', $equation);
echo "$equation\n";
$number = '((?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)?|pi|π)'; // What is a number
$functions = '(?:sinh?|cosh?|tanh?|acosh?|asinh?|atanh?|exp|log(10)?|deg2rad|rad2deg|sqrt|pow|abs|intval|ceil|floor|round|(mt_)?rand|gmp_fact)'; // Allowed PHP functions
$operators = '[\/*\^\+-,]'; // Allowed math operators
$regexp = '/^([+-]?('.$number.'|'.$functions.'\s*\((?1)+\)|\((?1)+\))(?:'.$operators.'(?1))?)+$/'; // Final regexp, heavily using recursive patterns
if (preg_match($regexp, $equation))
{
$equation = preg_replace('!pi|π!', 'pi()', $equation); // Replace pi with pi function
echo "$equation\n";
eval('$result = '.$equation.';');
}
else
{
$result = false;
}
return $result;
}
?>
Sounds, like your homework....but whatever.
You need to use string manipulation php has a lot of built in functions so your in luck. Check out the explode() function for sure and str_split().
Here is a full list of functions specifically related to strings: http://www.w3schools.com/php/php_ref_string.asp
Good Luck.
Anybody know, how to count the repeated words in the paragraph/file using PHP or Ruby on Rails, without using the looping structure. I appreciate, very shortest and performance wise speedup answer.
Thanks
In ruby using the text in a comment above
our_string = "Dog, as a devil deified, lived as a god."
our_string.strip.downcase.split(/[^\w']+/).group_by(&:to_s).map{|w| {w[0]=>w[1].count}}
=> [{"a"=>2}, {"devil"=>1}, {"god"=>1}, {"lived"=>1}, {"dog"=>1}, {"as"=>2}, {"deified"=>1}]
PHP Array Functions
$text = "apple, orange: banana. apple sausage bear orange";
$all_words = str_word_count($text, 1);
$unique_words = array_unique($all_words);
$repeated_words = array_diff_assoc($all_words, $unique_words);
echo "<pre>";
print_r($repeated_words);
echo "</pre>";
Output:
Array
(
[3] => apple
[6] => orange
)
Single function:
function repeatWords($text)
{
$all_words = str_word_count($text, 1);
$unique_words = array_unique($all_words);
return array_diff_assoc($all_words, $unique_words);
}
The simplest way that I've found is (using a loop).
This is over simplified, you can use preg_match to do case-insensitive matches.
This will work on a very large string as well.
$i=0;
$string = explode("The red fox is a fox");
foreach ($string as $s) {
if (preg_match("/fox/i",$s) {
$i++;
}
}
$string = implode($string);
if you don't know the word you're looking for, you probably need a hashmap. so you loop through the whole file in O(n) and save every word in it. I think everything else is slower.
maybe you get into trouble with the hashmap, if the file is really big.
try this link:
http://us3.php.net/manual/en/function.str-word-count.php
OR
How to find if there are words repeated twice in a row
Update:
May be this link work for u
http://www.devdaily.com/blog/post/ruby/ruby-count-number-of-times-words-appear-in-text-file
I'm building a PHP script to minify CSS/Javascript, which (obviously) involves getting rid of comments from the file. Any ideas how to do this? (Preferably, I need to get rid of /**/ and // comments)
Pattern for remove comments in JS
$pattern = '/((?:\/\*(?:[^*]|(?:\*+[^*\/]))*\*+\/)|(?:\/\/.*))/';
Pattern for remove comments in CSS
$pattern = '!/\*[^*]*\*+([^/][^*]*\*+)*/!';
$str = preg_replace($pattern, '', $str);
I hope above should help someone..
REFF : http://castlesblog.com/2010/august/14/php-javascript-css-minification
That wheel has been invented -- https://github.com/mrclay/minify.
PLEASE NOTE - the following approach will not work in all possible scenarios. Test before using in production.
Without preg patterns, without anything alike, this can be easily done with PHP built-in TOKENIZER. All three (PHP, JS and CSS as well) share the same way of representing comments in source files, and PHP's native, built-in token_get_all() function (without TOKEN_PARSE flag) can do dirty trick, even if the input string isn't well formed PHP code, which is exactly what one might need. All it asks is <?php at start of the string and magic happens. :)
<?php
function no_comments (string $tokens)
{ // Remove all block and line comments in css/js files with PHP tokenizer.
$remove = [];
$suspects = ['T_COMMENT', 'T_DOC_COMMENT'];
$iterate = token_get_all ('<?php '. PHP_EOL . $tokens);
foreach ($iterate as $token)
{
if (is_array ($token))
{
$name = token_name ($token[0]);
$chr = substr($token[1],0,1);
if (in_array ($name, $suspects)
&& $chr !== '#') $remove[] = $token[1];
}
}
return str_replace ($remove, '', $tokens);
}
The usage goes something like this:
echo no_comments ($myCSSorJsStringWithComments);
Take a look at minify, a "heavy regex-based removal of whitespace, unnecessary comments and tokens."
I have a really long string in a certain pattern such as:
userAccountName: abc userCompany: xyz userEmail: a#xyz.com userAddress1: userAddress2: userAddress3: userTown: ...
and so on. This pattern repeats.
I need to find a way to process this string so that I have the values of userAccountName:, userCompany:, etc. (i.e. preferably in an associative array or some such convenient format).
Is there an easy way to do this or will I have to write my own logic to split this string up into different parts?
Simple regular expressions like this userAccountName:\s*(\w+)\s+ can be used to capture matches and then use the captured matches to create a data structure.
If you can arrange for the data to be formatted as it is in a URL (ie, var=data&var2=data2) then you could use parse_str, which does almost exactly what you want, I think. Some mangling of your input data would do this in a straightforward manner.
You might have to use regex or your own logic.
Are you guaranteed that the string ": " does not appear anywhere within the values themselves? If so, you possibly could use implode to split the string into an array of alternating keys and values. You'd then have to walk through this array and format it the way you want. Here's a rough (probably inefficient) example I threw together quickly:
<?php
$keysAndValuesArray = implode(': ', $dataString);
$firstKeyName = 'userAccountName';
$associativeDataArray = array();
$currentIndex = -1;
$numItems = count($keysAndValuesArray);
for($i=0;$i<$numItems;i+=2) {
if($keysAndValuesArray[$i] == $firstKeyName) {
$associativeDataArray[] = array();
++$currentIndex;
}
$associativeDataArray[$currentIndex][$keysAndValuesArray[$i]] = $keysAndValuesArray[$i+1];
}
var_dump($associativeDataArray);
If you can write a regexp (for my example I'm considering there're no semicolons in values), you can parse it with preg_split or preg_match_all like this:
<?php
$raw_data = "userAccountName: abc userCompany: xyz";
$raw_data .= " userEmail: a#xyz.com userAddress1: userAddress2: ";
$data = array();
// /([^:]*\s+)?/ part works because the regexp is "greedy"
if (preg_match_all('/([a-z0-9_]+):\s+([^:]*\s+)?/i', $raw_data,
$items, PREG_SET_ORDER)) {
foreach ($items as $item) {
$data[$item[1]] = $item[2];
}
print_r($data);
}
?>
If that's not the case, please describe the grammar of your string in a bit more detail.
PCRE is included in PHP and can respond to your needs using regexp like:
if ($c=preg_match_all ("/userAccountName: (<userAccountName>\w+) userCompany: (<userCompany>\w+) userEmail: /", $txt, $matches))
{
$userAccountName = $matches['userAccountName'];
$userCompany = $matches['userCompany'];
// and so on...
}
the most difficult is to get the good regexp for your needs.
you can have a look at http://txt2re.com for some help
I think the solution closest to what I was looking for, I found at http://www.justin-cook.com/wp/2006/03/31/php-parse-a-string-between-two-strings/. I hope this proves useful to someone else. Thanks everyone for all the suggested solutions.
If i were you, i'll try to convert the strings in a json format with some regexp.
Then, simply use Json.