Unspin text in php from spun text on sentence level

Unspin text in php from spun text on sentence level - php

I need to neatly output spun text in a php page.
I already have the prespun text in {hi|hello|greetings} format.
I have a php code that i found elsewhere, but it does not output the spun text on sentence level, where two {{ come.
Here is the code that needs fixing.
<?php
function spinText($text){
$test = preg_match_all("#\{(.*?)\}#", $text, $out);
if (!$test) return $text;
$toFind = Array();
$toReplace = Array();
foreach($out[0] AS $id => $match){
$choices = explode("|", $out[1][$id]);
$toFind[]=$match;
$toReplace[]=trim($choices[rand(0, count($choices)-1)]);
}
return str_replace($toFind, $toReplace, $text);
}
echo spinText("{Hello|Hi|Greetings}!");;
?>
The output will be randomly chose word: Hello OR Hi OR Greetings.
However, if there is a sentence level spinning, the output is messed up.
E.g.:
{{hello|hi}.{how're|how are} you|{How's|How is} it going}
The output is
{hello.how're you|How is it going}
As you can see the text has not been spun completely.
Thank you

This is a recursive problem, so regular expressions aren't that great; but recursive patterns can help though:
function bla($s)
{
// first off, find the curly brace patterns (those that are properly balanced)
if (preg_match_all('#\{(((?>[^{}]+)|(?R))*)\}#', $s, $matches, PREG_OFFSET_CAPTURE)) {
// go through the string in reverse order and replace the sections
for ($i = count($matches[0]) - 1; $i >= 0; --$i) {
// we recurse into this function here
$s = substr_replace($s, bla($matches[1][$i][0]), $matches[0][$i][1], strlen($matches[0][$i][0]));
}
}
// once we're done, it should be safe to split on the pipe character
$choices = explode('|', $s);
return $choices[array_rand($choices)];
}
echo bla("{{hello|hi}.{how're|how are} you|{How's|How is} it going}"), "\n";
See also: Recursive patterns

Related

regex php special characters

$word = file_get_contents('http://www.pixelmon-server-list.com/list.txt');
$content = file_get_contents('http://www.pixelmon-server-list.com/fleetyfleet.txt');
$found_dimensions = array(); // our array
$word_array = explode(' ', $word); // explode the list
foreach($word_array as $one_word) { // loop over it
$str = 'DimensionName'.$one_word; // what are we looking for?
if(strstr($content, $str) !== false) { // look for it!
echo $one_word; // Just for demonstration purposes
$found_dimensions[] = $one_word; // add to the array
}
}
okay i have a list.text and a fleetyfleet.txt
both can be viewed here i didn't post them for space sake
http://pastebin.com/7hWDUG1b
but what i want to do is find the words in list.txt but only add them to array if there prefix is Dimension�����Name� but the special characters make it kinda tough I'm not sure what i should do

I had a similar problem where I needed to remove all non-ascii characters from a file. Here's the regex I used:
s/[^\x00-\x7F]//g
If you're on linux, here's a quick one-liner:
perl -p -i -e "s/[^\x00-\x7F]//g" list.txt

Having to guess a little bit as I can't see the files from behind my firewall. The following code might help you:
<?php
$f1 = explode("\n",file_get_contents("./file1.txt"));
$f2 = file_get_contents("./file2.txt");
$found = array();
foreach($f1 as $x) {
$str = "/DimensionName.....$x./";
if (strlen($x)>0) {
if (preg_match($str, $f2, $matches)) {
echo $matches[0]."\n";
}
}
}
?>
This prints out lines that include a pattern DimensionName followed by 5 "anything" followed by whatever word was read from the first file file1.txt.
If you need this to be further refined, please leave a comment.

PHP word censor with keeping the original caps

We want to censor certain words on our site but each word has different censored output.
For example:
PHP => P*P, javascript => j*vascript
(However not always the second letter.)
So we want a simple "one star" censor system but with keeping the original caps. The datas coming from the database are uncensored so we need the fastest way that possible.
$data="Javascript and php are awesome!";
$word[]="PHP";
$censor[]="H";//the letter we want to replace
$word[]="javascript";
$censor[]="a"//but only once (j*v*script would look wierd)
//Of course if it needed we can use the full censored word in $censor variables
Expected value:
J*vascript and p*p are awesome!
Thanks for all the answers!

You can put your censored words in key-based array, and value of the array should be the position of what char is replaced with * (see $censor array example bellow).
$string = 'JavaSCRIPT and pHp are testing test-ground for TEST ŠĐČĆŽ ŠĐčćŽ!';
$censor = [
'php' => 2,
'javascript' => 2,
'test' => 3,
'šđčćž' => 4,
];
function stringCensorSlow($string, array $censor) {
foreach ($censor as $word => $position) {
while (($pos = mb_stripos($string, $word)) !== false) {
$string =
mb_substr($string, 0, $pos + $position - 1) .
'*' .
mb_substr($string, $pos + $position);
}
}
return $string;
}
function stringCensorFast($string, array $censor) {
$pattern = [];
foreach ($censor as $word => $position) {
$word = '~(' . mb_substr($word, 0, $position - 1) . ')' . mb_substr($word, $position - 1, 1) . '(' . mb_substr($word, $position) . ')~iu';
$pattern[$word] = '$1*$2';
}
return preg_replace(array_keys($pattern), array_values($pattern), $string);
}
Use example :
echo stringCensorSlow($string, $censor);
# J*vaSCRIPT and p*p are te*ting te*t-ground for TE*T ŠĐČ*Ž ŠĐč*Ž!
echo stringCensorFast($string, $censor) . "\n";
# J*vaSCRIPT and p*p are te*ting te*t-ground for TE*T ŠĐČ*Ž ŠĐč*Ž!
Speed test :
foreach (['stringCensorSlow', 'stringCensorFast'] as $func) {
$time = microtime(true);
for ($i = 0; $i < 10000; $i++) {
$func($string, $censor);
}
$time = microtime(true) - $time;
echo "{$func}() took $time\n";
}
output on my localhost was :
stringCensorSlow() took 1.9752140045166
stringCensorFast() took 0.11587309837341
Upgrade #1: added multibyte character safe.
Upgrade #2: added example for preg_replace, which is faster than mb_substr. Tnx to AbsoluteƵERØ
Upgrade #3: added speed test loop and result on my local PC machine.

Make an array of words and replacements. This should be your fastest option in terms of processing, but a little more methodical to setup. Remember when you're setting up your patterns to use the i modifier to make each pattern case insensitive. You could ultimately pull these from a database into the arrays. I've hard-coded the arrays here for the example.
<!DOCTYPE html>
<html>
<meta content="text/html; charset=UTF-8" http-equiv="content-type">
<?php
$word_to_alter = array(
'!(j)a(v)a(script)(s|ing|ed)?!i',
'!(p)h(p)!i',
'!(m)y(sql)!i',
'!(p)(yth)o(n)!i',
'!(r)u(by)!i',
'!(ВЗЛ)О(М)!iu',
);
$alteration = array(
'$1*$2*$3$4',
'$1*$2',
'$1*$2',
'$1$2*$3',
'$1*$2',
'$1*$2',
);
$string = "Welcome to the world of programming. You can learn PHP, MySQL, Python, Ruby, and Javascript all at your own pace. If you know someone who uses javascripting in their daily routine you can ask them about becoming a programmer who writes JavaScripts. взлом прохладно";
$newstring = preg_replace($word_to_alter,$alteration,$string);
echo $newstring;
?>
</html>
Output
Welcome to the world of programming. You can learn P*P, M*SQL, Pyth*n,
R*by, and J*v*script all at your own pace. If you know someone who
uses j*v*scripting in their daily routine you can ask them about
becoming a programmer who writes J*v*Scripts. взл*м прохладно
Update
It works the same with UTF-8 characters, note that you have to specify a u modifier to make the pattern treated as UTF-8.
u (PCRE_UTF8)
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This
modifier is available from PHP 4.1.0 or greater on Unix and from PHP
4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.

Why not just use a little helper function and pass it a word and the desired censor?
function censorWord($word, $censor) {
if(strpos($word, $censor)) {
return preg_replace("/$censor/",'*', $word, 1);
}
}
echo censorWord("Javascript", "a"); // returns J*avascript
echo censorWord("PHP", "H"); // returns P*P
Then you can check the word against your wordlist and if it is a word that should be censored, you can pass it to the function. Then, you also always have the original word as well as the censored one to play with or put back in your sentence.
This would also make it easy to change the number of letters censored by just changing the offset in the preg_replace. All you have to do is keep an array of words, explode the sentence on spaces or something, and then check in_array. If it is in the array, send it to censorWord().
Demo
And here's a more complete example doing exactly what you said in the OP.
function censorWord($word, $censor) {
if(strpos($word, $censor)) {
return preg_replace("/$censor/",'*', $word, 1);
}
}
$word_list = ['php','javascript'];
$data = "Javascript and php are awesome!";
$words = explode(" ", $data);
// pass each word by reference so it can be modified inside our array
foreach($words as &$word) {
if(in_array(strtolower($word), $word_list)) {
// this just passes the second letter of the word
// as the $censor argument
$word = censorWord($word, $word[1]);
}
}
echo implode(" ", $words); // returns J*vascript and p*p are awesome!
Another Demo

You could store a lowercase list of the censored words somewhere, and if you're okay with starring the second letter every time, do something like this:
if (in_array(strtolower($word), $censored_words)) {
$word = substr($word, 0, 1) . "*" . substr($word, 2);
}
If you want to change the first occurrence of a letter, you could do something like:
$censored_words = array('javascript' => 'a', 'php' => 'h', 'ruby' => 'b');
$lword = strtolower($word);
if (in_array($lword, array_keys($censored_words))) {
$ind = strpos($lword, $censored_words[$lword]);
$word = substr($word, 0, $ind) . "*" . substr($word, $ind + 1);
}

This is what I would do:
Create a simple database (text file) and make a "table" of all your censored words and expected censored results. E.G.:
PHP --- P*P
javascript --- j*vascript
HTML --- HT*L
Write PHP code to compare the database information to your simple censored file. You will have to use array explode to create an array of only words. Something like this:
/* Opening database of censored words */
$filename = "/files/censored_words.txt";
$file = fopen( $filename, "r" );
if( $file == false )
{
echo ( "Error in opening file" );
exit();
}
/* Creating an array of words from string*/
$data = explode(" ", $data); // What was "Javascript and PHP are awesome!" has
// become "Javascript", "and", "PHP", "are",
// "awesome!". This is useful.
If your script finds matching words, replace the word in your data with the censored word from your list. You would have to delimit the file first by \r\n and finally by ---. (Or whatever you choose for separating your table with.)
Hope this helped!

php regex - Scraping images from javascript object

I'm trying to scrape images from the mark-up of certain webpages. These webpages all have a slideshow. Their sources are contained in javascript objects on the page. I'm thinking i need to get_file_contents("http://www.example.com/page/1"); and then have a preg_match_all() function that i can input a phrase(ie. "\"LargeUrl\":\"", or "\"Description\":\"") and get the string of characters until it hits the next quotation mark it finds.
var photos = {};
photos['photo-391094'] = {"LargeUrl": "http://www.example.org/images/1.png","Description":"blah blah balh"};
photos['photo-391095'] = {"LargeUrl": "http://www.example.org/images/2.png","Description":"blah blah balh"};
photos['photo-391096'] = {"LargeUrl": "http://www.example.org/images/3.png","Description":"blah blah balh"};
I have this function, but it returns the entire line after the input phrase. How can i modify it to look for whatever's after the input phrase up until it hits the next quotation mark it finds? Or am i doing this all wrong and there's a better way?
$page = file_get_contents("http://www.example.org/page/1");
$word = "\"LargeUrl\":\"";
if(preg_match_all("/(?<=$word)\S+/i", $page, $matches))
{
echo "<pre>";
print_r($matches);
echo "</pre>";
}
Ideally the function would return a an array like the following if i inputed "\"LargeUrl\":\""
$matches[0] = "http://www.example.org/images/1.png";
$matches[1] = "http://www.example.org/images/2.png";
$matches[2] = "http://www.example.org/images/3.png";

You can use parenthesis to capture the parts you're interested in. A simple regex to do it is
$word = '"LargeUrl":';
$pattern = "$word" . '\s+"([^"]+)"';
preg_match_all("/$pattern/", $page, $matches);
print_r($matches[1]);

There is definitely a regex that will match each image URL, but you could also, if its easier for you, match the whole object and then json_decode() the matched string

I have perfect solution for you....use the following code and you will get your needed result.
preg_match_all('/{"LargeUrl":(.*?)"(.*?)"/', $page, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
echo "<pre>";
echo $result[2][$i];
echo "</pre>";
}
Thanks......p2c

mb_eregi_replace multiple matches get them

$string = 'test check one two test3';
$result = mb_eregi_replace ( 'test|test2|test3' , '<$1>' ,$string ,'i');
echo $result;
This should deliver: <test> check one two <test3>
Is it possible to get, that test and test3 was found, without using another match function ?

You can use preg_replace_callback instead:
$string = 'test check one two test3';
$matches = array();
$result = preg_replace_callback('/test|test2|test3/i' , function($match) use ($matches) {
$matches[] = $match;
return '<'.$match[0].'>';
}, $string);
echo $result;
Here preg_replace_callback will call the passed callback function for each match of the pattern (note that its syntax differs from POSIX). In this case the callback function is an anonymous function that adds the match to the $matches array and returns the substitution string that the matches are to be replaced by.
Another approach would be to use preg_split to split the string at the matched delimiters while also capturing the delimiters:
$parts = preg_split('/test|test2|test3/i', $string, null, PREG_SPLIT_DELIM_CAPTURE);
The result is an array of alternating non-matching and matching parts.

As far as I know, eregi is deprecated.
You could do something like this:
<?php
$str = 'test check one two test3';
$to_match = array("test", "test2", "test3");
$rep = array();
foreach($to_match as $val){
$rep[$val] = "<$val>";
}
echo strtr($str, $rep);
?>
This too allows you to easily add more strings to replace.

Hi following function used to found the any word from string
<?php
function searchword($string, $words)
{
$matchFound = count($words);// use tha no of word you want to search
$tempMatch = 0;
foreach ( $words as $word )
{
preg_match('/'.$word.'/',$string,$matches);
//print_r($matches);
if(!empty($matches))
{
$tempMatch++;
}
}
if($tempMatch==$matchFound)
{
return "found";
}
else
{
return "notFound";
}
}
$string = "test check one two test3";
/*** an array of words to highlight ***/
$words = array('test', 'test3');
$string = searchword($string, $words);
echo $string;
?>

If your string is utf-8, you could use preg_replace instead
$string = 'test check one two test3';
$result = preg_replace('/(test3)|(test2)|(test)/ui' , '<$1>' ,$string);
echo $result;
Oviously with this kind of data to match the result will be suboptimal
<test> check one two <test>3
You'll need a longer approach than a direct search and replace with regular expressions (surely if your patterns are prefixes of other patterns)

To begin with, the code you want to enhance does not seem to comply with its initial purpose (not at least in my computer). You can try something like this:
$string = 'test check one two test3';
$result = mb_eregi_replace('(test|test2|test3)', '<\1>', $string);
echo $result;
I've removed the i flag (which of course makes little sense here). Still, you'd still need to make the expression greedy.
As for the original question, here's a little proof of concept:
function replace($match){
$GLOBALS['matches'][] = $match;
return "<$match>";
}
$string = 'test check one two test3';
$matches = array();
$result = mb_eregi_replace('(test|test2|test3)', 'replace(\'\1\')', $string, 'e');
var_dump($result, $matches);
Please note this code is horrible and potentially insecure. I'd honestly go with the preg_replace_callback() solution proposed by Gumbo.

Can I use regex for this?

Is this possible with regex?
I have a file, and if a '#' is found in the file, the text after the '#' with the '#' is to be replaced with the file with the same name as after the '#'.
File1: "this text is found in file1"
File2: "this file will contain text from file1: #file1".
File2 after regex: "this file will contain text from file1: this text is found in file1".
I wish to do this with php and I've heard that the preg function is better than the ereg, but whatever works is fine with me =)
Thanks a lot!
EDIT:
It has to be programmed, so that it looks through file2 without knowing which files to concatenate before it has gone through all occurrences of a # :)

PHP's native functions str_pos and str_replace are better to use when you're searching through larger files or strings. ;)

First of all the grammar of your templating is not a very good one becuase the parser may not exactly sure when will the file name ends.
My suggestion would be that you change to the one that can better detect the boundry like {#:filename}.
Anyhow, the code I give below follows your question.
<?php
// RegEx Utility functions -------------------------------------------------------------------------
function ReplaceAll($RegEx, $Processor, $Text) {
// Make sure the processor can be called
if(!is_callable($Processor))
throw new Exception("\"$Processor\" is not a callable.");
// Do the Match
preg_match_all($RegEx, $Text, $Matches, PREG_OFFSET_CAPTURE + PREG_SET_ORDER);
// Do the replacment
$NewText = "";
$MatchCount = count($Matches);
$PrevOffset = 0;
for($i = 0; $i < $MatchCount; $i++) {
// Get each match and the full match information
$EachMatch = $Matches[$i];
$FullMatch = is_array($EachMatch) ? $EachMatch[0] : $EachMatch;
// Full match is each match if no grouping is used in the regex
// Full match is the first element of each match if grouping is used in the regex.
$MatchOffset = $FullMatch[1];
$MatchText = $FullMatch[0];
$MatchTextLength = strlen($MatchText);
$NextOffset = $MatchOffset + $MatchTextLength;
// Append the non-match and the replace of the match
$NewText .= substr($Text, $PrevOffset, $MatchOffset - $PrevOffset);
$NewText .= $Processor($EachMatch);
// The next prev-offset
$PrevOffset = $NextOffset;
}
// Append the rest of the text
$NewText .= substr($Text, $PrevOffset);
return $NewText;
}
function GetGroupMatchText($Match, $Index) {
if(!is_array($Match))
return $Match[0];
$Match = $Match[$Index];
return $Match[0];
}
// Replacing by file content -----------------------------------------------------------------------
$RegEx_FileNameInText = "/#([a-zA-Z0-9]+)/"; // Group #1 is the file name
$ReplaceFunction_ByFileName = "ReplaceByFileContent";
function ReplaceByFileContent($Match) {
$FileName = GetGroupMatchText($Match, 1); // Group # is the gile name
// $FileContent = get_file_content($FileName); // Get the content of the file
$FileContent = "{# content of: $FileName}"; // Dummy content for testing
return $FileContent; // Returns the replacement
}
// Main --------------------------------------------------------------------------------------------
$Text = " === #file1 ~ #file2 === ";
echo ReplaceAll($RegEx_FileNameInText, $ReplaceFunction_ByFileName, $Text);
This will returns === {# content of: file1} ~ {# content of: file2} ===.
The program will replace all the regex match with the replacement returned from the result of the given function name.
In this case, the callback function is ReplaceByFileContent in which the file name is extract from the group #1 in the regex.
I believe my code is self documented but if you have any question, you can ask me.
Hope I helps.

Much cleaner:
<?php
$content = file_get_content('content.txt');
$m = array();
preg_match_all('`#([^\s]*)(\s|\Z)`ism', $content, $m, PREG_SET_ORDER);
foreach($m as $match){
$innerContent = file_get_contents($match[1]);
$content = str_replace('#'.$match[1], $innerContent, $content);
}
// done!
?>
regex tested with: http://www.spaweditor.com/scripts/regex/index.php

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Unspin text in php from spun text on sentence level - php

Related

regex php special characters

PHP word censor with keeping the original caps

php regex - Scraping images from javascript object

mb_eregi_replace multiple matches get them

Can I use regex for this?

Categories

Resources