Can I use regex for this? - php

Is this possible with regex?
I have a file, and if a '#' is found in the file, the text after the '#' with the '#' is to be replaced with the file with the same name as after the '#'.
File1: "this text is found in file1"
File2: "this file will contain text from file1: #file1".
File2 after regex: "this file will contain text from file1: this text is found in file1".
I wish to do this with php and I've heard that the preg function is better than the ereg, but whatever works is fine with me =)
Thanks a lot!
EDIT:
It has to be programmed, so that it looks through file2 without knowing which files to concatenate before it has gone through all occurrences of a # :)

PHP's native functions str_pos and str_replace are better to use when you're searching through larger files or strings. ;)

First of all the grammar of your templating is not a very good one becuase the parser may not exactly sure when will the file name ends.
My suggestion would be that you change to the one that can better detect the boundry like {#:filename}.
Anyhow, the code I give below follows your question.
<?php
// RegEx Utility functions -------------------------------------------------------------------------
function ReplaceAll($RegEx, $Processor, $Text) {
// Make sure the processor can be called
if(!is_callable($Processor))
throw new Exception("\"$Processor\" is not a callable.");
// Do the Match
preg_match_all($RegEx, $Text, $Matches, PREG_OFFSET_CAPTURE + PREG_SET_ORDER);
// Do the replacment
$NewText = "";
$MatchCount = count($Matches);
$PrevOffset = 0;
for($i = 0; $i < $MatchCount; $i++) {
// Get each match and the full match information
$EachMatch = $Matches[$i];
$FullMatch = is_array($EachMatch) ? $EachMatch[0] : $EachMatch;
// Full match is each match if no grouping is used in the regex
// Full match is the first element of each match if grouping is used in the regex.
$MatchOffset = $FullMatch[1];
$MatchText = $FullMatch[0];
$MatchTextLength = strlen($MatchText);
$NextOffset = $MatchOffset + $MatchTextLength;
// Append the non-match and the replace of the match
$NewText .= substr($Text, $PrevOffset, $MatchOffset - $PrevOffset);
$NewText .= $Processor($EachMatch);
// The next prev-offset
$PrevOffset = $NextOffset;
}
// Append the rest of the text
$NewText .= substr($Text, $PrevOffset);
return $NewText;
}
function GetGroupMatchText($Match, $Index) {
if(!is_array($Match))
return $Match[0];
$Match = $Match[$Index];
return $Match[0];
}
// Replacing by file content -----------------------------------------------------------------------
$RegEx_FileNameInText = "/#([a-zA-Z0-9]+)/"; // Group #1 is the file name
$ReplaceFunction_ByFileName = "ReplaceByFileContent";
function ReplaceByFileContent($Match) {
$FileName = GetGroupMatchText($Match, 1); // Group # is the gile name
// $FileContent = get_file_content($FileName); // Get the content of the file
$FileContent = "{# content of: $FileName}"; // Dummy content for testing
return $FileContent; // Returns the replacement
}
// Main --------------------------------------------------------------------------------------------
$Text = " === #file1 ~ #file2 === ";
echo ReplaceAll($RegEx_FileNameInText, $ReplaceFunction_ByFileName, $Text);
This will returns === {# content of: file1} ~ {# content of: file2} ===.
The program will replace all the regex match with the replacement returned from the result of the given function name.
In this case, the callback function is ReplaceByFileContent in which the file name is extract from the group #1 in the regex.
I believe my code is self documented but if you have any question, you can ask me.
Hope I helps.

Much cleaner:
<?php
$content = file_get_content('content.txt');
$m = array();
preg_match_all('`#([^\s]*)(\s|\Z)`ism', $content, $m, PREG_SET_ORDER);
foreach($m as $match){
$innerContent = file_get_contents($match[1]);
$content = str_replace('#'.$match[1], $innerContent, $content);
}
// done!
?>
regex tested with: http://www.spaweditor.com/scripts/regex/index.php

Related

PHP preg_replace all text changing

I want to make some changes to the html but I have to follow certain rules.
I have a source code like this;
A beautiful sentence http://www.google.com/test, You can reach here http://www.google.com/test-mi or http://www.google.com/test/aliveli
I need to convert this into the following;
A beautiful sentence http://test.google.com/, You can reach here http://www.google.com/test-mi or http://test.google.com/aliveli
I tried using str_replace;
$html = str_replace('://www.google.com/test','://test.google.com');
When I use it like this, I get an incorrect result like;
A beautiful sentence http://test.google.com/, You can reach here http://test.google.com/-mi or http://test.google.com/aliveli
Wrong replace: http://test.google.com/-mi
How can I do this with preg_replace?
With regex you can use a word boundary and a lookahead to prevent replacing at -
$pattern = '~://www\.google\.com/test\b(?!-)~';
$html = preg_replace($pattern, "://test.google.com", $html);
Here is a regex demo at regex101 and a php demo at eval.in
Be aware, that you need to escape certain characters by a backslash from it's special meaning to match them literally when using regex.
It seems you're replacing the subdirectory test to subdomain. Your case seems to be too complicated. But I've given my best to apply some logic which may be reliable or may not be unless your string stays with the same structure. But you can give a try with this code:
$html = "A beautiful sentence http://www.google.com/test, You can reach here http://www.google.com/test-mi or http://www.google.com/test/aliveli";
function set_subdomain_string($html, $subdomain_word) {
$html = explode(' ', $html);
foreach($html as &$value) {
$parse_html = parse_url($value);
if(count($parse_html) > 1) {
$path = preg_replace('/[^0-9a-zA-Z\/-_]/', '', $parse_html['path']);
preg_match('/[^0-9a-zA-Z\/-_]/', $parse_html['path'], $match);
if(preg_match_all('/(test$|test\/)/', $path)) {
$path = preg_replace('/(test$|test\/)/', '', $path);
$host = preg_replace('/www/', 'test', $parse_html['host']);
$parse_html['host'] = $host;
if(!empty($match)) {
$parse_html['path'] = $path . $match[0];
} else {
$parse_html['path'] = $path;
}
unset($parse_html['scheme']);
$url_string = "http://" . implode('', $parse_html);
$value = $url_string;
}
}
unset($value);
}
$html = implode(' ', $html);
return $html;
}
echo "<p>{$html}</p>";
$modified_html = set_subdomain_string($html, 'test');
echo "<p>{$modified_html}</p>";
Hope it helps.
If the sentence is the only case in your problem you don't need to start struggling with preg_replace.
Just change your str_replace() functioin call to the following(with the ',' at the end of search string section):
$html = str_replace('://www.google.com/test,','://test.google.com/,');
This matches the first occurance of desired search parameter, and for the last one in your target sentence, add this(Note the '/' at the end):
$html = str_replace('://www.google.com/test/','://test.google.com/');
update:
Use these two:
$targetStr = preg_replace("/:\/\/www.google.com\/test[\s\/]/", "://test.google.com/", $targetStr);
It will match against all but the ones with comma at the end. For those, use you sould use the following:
$targetStr = preg_replace("/:\/\/www.google.com\/test,/", "://test.google.com/,", $targetStr);

preg_match() to check Image urls without [img] BB tags and return boolean value using PHP

In my text field I have images enclosed within [img] BB tags like
[img]http://i58.tinypic.com/i3yxar.jpg[/img]
and plain image URLs like
http://www.jonco48.com/blog/tongue1.jpg
I want preg_match to look for plain image urls and if found return 1 otherwise 0, How to do this???
Thanks
With regex is quite difficult to look for a pattern without a piece, in this case the img open and closure tag.
So I would search the urls within the tag, then search all the urls and compare these counts
$text = "";
$tagPattern = "/\[img\].+?\[\/img\]/";
preg_match_all($pattern, $text, $tagMatches);
$urlInTagCount = count($tagMatches[0]);
$plainPattern = "~https?://\S+\.(?:jpe?g|gif|png)(?:\?\S*)?(?=\s|$|\pP)~i";
preg_match_all($pattern, $text, $plainMatches);
$allUrlCount = count($plainMatches[0]);
return $allUrlCount > $urlInTagCount;
Using regex is really overkill for this if all you need to do is check whether or not there are [img][/img] tags around your string.
You can just as easily use some simple string functions:
function isBB($s){
$len = strlen($s);
return $check = substr($s, 0, 5) == "[img]" && substr($s, $len-6, $len) == "[/img]";
}
isBB('[img]http://i58.tinypic.com/i3yxar.jpg[/img]') // true
isBB('http://www.jonco48.com/blog/tongue1.jpg') //false
Here you have the REGEX : ~https?://\S+\.(?:jpe?g|gif|png)(?:\?\S*)?(?=\s|$|\pP)~i
In PHP :
preg_match('#\[img\](.+?)\[/img\]#', $your_text, $matches);
echo $matches[1];
The following should work as expected:
<?php
$str = '[img]http://i58.tinypic.com/i3yxar.jpg[/img]';
preg_match('#\[img\](.+?)\[/img\]#', $str, $matches);
echo $matches[1];

regex php special characters

$word = file_get_contents('http://www.pixelmon-server-list.com/list.txt');
$content = file_get_contents('http://www.pixelmon-server-list.com/fleetyfleet.txt');
$found_dimensions = array(); // our array
$word_array = explode(' ', $word); // explode the list
foreach($word_array as $one_word) { // loop over it
$str = 'DimensionName'.$one_word; // what are we looking for?
if(strstr($content, $str) !== false) { // look for it!
echo $one_word; // Just for demonstration purposes
$found_dimensions[] = $one_word; // add to the array
}
}
okay i have a list.text and a fleetyfleet.txt
both can be viewed here i didn't post them for space sake
http://pastebin.com/7hWDUG1b
but what i want to do is find the words in list.txt but only add them to array if there prefix is Dimension�����Name� but the special characters make it kinda tough I'm not sure what i should do
I had a similar problem where I needed to remove all non-ascii characters from a file. Here's the regex I used:
s/[^\x00-\x7F]//g
If you're on linux, here's a quick one-liner:
perl -p -i -e "s/[^\x00-\x7F]//g" list.txt
Having to guess a little bit as I can't see the files from behind my firewall. The following code might help you:
<?php
$f1 = explode("\n",file_get_contents("./file1.txt"));
$f2 = file_get_contents("./file2.txt");
$found = array();
foreach($f1 as $x) {
$str = "/DimensionName.....$x./";
if (strlen($x)>0) {
if (preg_match($str, $f2, $matches)) {
echo $matches[0]."\n";
}
}
}
?>
This prints out lines that include a pattern DimensionName followed by 5 "anything" followed by whatever word was read from the first file file1.txt.
If you need this to be further refined, please leave a comment.

Unspin text in php from spun text on sentence level

I need to neatly output spun text in a php page.
I already have the prespun text in {hi|hello|greetings} format.
I have a php code that i found elsewhere, but it does not output the spun text on sentence level, where two {{ come.
Here is the code that needs fixing.
<?php
function spinText($text){
$test = preg_match_all("#\{(.*?)\}#", $text, $out);
if (!$test) return $text;
$toFind = Array();
$toReplace = Array();
foreach($out[0] AS $id => $match){
$choices = explode("|", $out[1][$id]);
$toFind[]=$match;
$toReplace[]=trim($choices[rand(0, count($choices)-1)]);
}
return str_replace($toFind, $toReplace, $text);
}
echo spinText("{Hello|Hi|Greetings}!");;
?>
The output will be randomly chose word: Hello OR Hi OR Greetings.
However, if there is a sentence level spinning, the output is messed up.
E.g.:
{{hello|hi}.{how're|how are} you|{How's|How is} it going}
The output is
{hello.how're you|How is it going}
As you can see the text has not been spun completely.
Thank you
This is a recursive problem, so regular expressions aren't that great; but recursive patterns can help though:
function bla($s)
{
// first off, find the curly brace patterns (those that are properly balanced)
if (preg_match_all('#\{(((?>[^{}]+)|(?R))*)\}#', $s, $matches, PREG_OFFSET_CAPTURE)) {
// go through the string in reverse order and replace the sections
for ($i = count($matches[0]) - 1; $i >= 0; --$i) {
// we recurse into this function here
$s = substr_replace($s, bla($matches[1][$i][0]), $matches[0][$i][1], strlen($matches[0][$i][0]));
}
}
// once we're done, it should be safe to split on the pipe character
$choices = explode('|', $s);
return $choices[array_rand($choices)];
}
echo bla("{{hello|hi}.{how're|how are} you|{How's|How is} it going}"), "\n";
See also: Recursive patterns

regex help with getting tag content in PHP

so I have the code
function getTagContent($string, $tagname) {
$pattern = "/<$tagname.*?>(.*)<\/$tagname>/";
preg_match($pattern, $string, $matches);
print_r($matches);
}
and then I call
$url = "http://www.freakonomics.com/2008/09/24/wall-street-jokes-please/";
$html = file_get_contents($url);
getTagContent($html,"title");
but then it shows that there are no matches, while if you open the source of the url there clearly exist a title tag....
what did I do wrong?
try DOM
$url = "http://www.freakonomics.com/2008/09/24/wall-street-jokes-please/";
$doc = new DOMDocument();
$dom = $doc->loadHTMLFile($url);
$items = $doc->getElementsByTagName('title');
for ($i = 0; $i < $items->length; $i++)
{
echo $items->item($i)->nodeValue . "\n";
}
The 'title' tag is not on the same line as its closing tag, so your preg_match doesn't find it.
In Perl, you can add a /s switch to make it slurp the whole input as though on one line: I forget whether preg_match will let you do so or not.
But this is just one of the reasons why parsing XML and variants with regexp is a bad idea.
Probably because the title is spread on multiple lines. You need to add the option s so that the dot will also match any line returns.
$pattern = "/<$tagname.*?>(.*)<\/$tagname>/s";
Have your php function getTagContent like this:
function getTagContent($string, $tagname) {
$pattern = '/<'.$tagname.'[^>]*>(.*?)<\/'.$tagname.'>/is';
preg_match($pattern, $string, $matches);
print_r($matches);
}
It is important to use non-greedy match all .*? for matching text between start and end of tag and equally important is to use flags s for DOTALL (matches new line as well) and i for ignore case comparison.

Categories