Replace symbols in string dependent on occurence. PHP - php

I have a string that is formatted by symbols (symbols similar to those used to format questions on this site).
Rules:
**Hello** means bold = < b>Hello< /b>
*Hello\\ means bulleted list = < li>Hello< /li>
Hello\\ means line break = Hello< br>
I want to replace:
Every first occurence of ** with < b> and every second ** with < /b>.
The same for * with < li> and \\ with < /li>.
All \\ that occur without a * somewhere in the string before, should be converted to < br>.
Example string:
$myString = 'Hello my **Friend**,\\here is the stuff you need to buy for me:*knife\\*water bottle\\***fake ID**\\\\\\Thank you in advance and don not forget the **fake ID**!\\Sincerely yours\\Eddy'
Note: This style is not my invention. It is in use and I have to convert it.
I preg_match()-ed parts of it to get the stuff between the tags.
$myString = 'Hello my **Friend**,\\here is the stuff you need to buy for me:*knife\\*water bottle\\***fake ID**\\\\\\Thank you in advance and don not forget the **fake ID**!\\Sincerely yours\\Eddy';
$result = array();
$firstBold = '<b>'. preg_match('~\*\*(.*?)\*\*~', $myString, $firstBold) . </b>;
$result += $firstBold
// and so on...
(Ignore mistakes in this, its written from memory)
I didn't consider the words before the first bold, but it's basically the same.
This will get the job done at the end but it seems cumbersome to me. I am in search for a more elegant way to do this.
What is the best way to solve this in PHP?

You can use preg_replace. because of your markup your order of replacement will matter.
http://php.net/manual/en/function.preg-replace.php
$myString = preg_replace("/[*][*]([^*]+)[*][*]/",'<b>${1}</b>',$myString);
$myString = preg_replace("/[*]([^\/]+)[\/][\/]/",'<li>${1}</li>',$myString);
$myString = str_replace("//",'<br/>',$myString);

Related

php string replace by str_replace issue

i made a function to replace a words in a string by putting new words from an array.
this is my code
function myseo($t){
$a = array('me','lord');
$b = array('Mine','TheLord');
$theseotext = $t;
$theseotext = str_replace($a,$b, $theseotext);
return $theseotext;
}
echo myseo('This is me Mrlord');
the output is
This is Mine MrTheLord
and it is wrong it should be print
This is Mine Mrlord
because word (Mrlord) is not included in the array.
i hope i explained my issue in good way. any help guys
regards
According to the code it is correct, but you want it to isolate by word. You could simply do this:
function myseo($t){
$a = array(' me ',' lord ');
$b = array(' Mine ',' TheLord ');
return str_replace($a,$b, ' '.$t.' ');
}
echo myseo('This is me Mrlord');
keep in mind this is kind of a cheap hack since I surround the replace string with empty spaces to ensure both sides get considered. This wouldn't work for punctuated strings. The alternate would be to break apart the string and replace each word individually.
str_replace doesn't look at full words only - it looks at any matching sequence of characters.
Thus, lord matches the latter part of Mrlord.
use str_ireplace instead, it's case insensitive.

Break up long words in a UTF-8 text, with PHP

Horrible title, I know.
I want to have some kind of wordwrap, but obviously can not use wordwrap() as it messes up UTF-8.. not to mention markup.
My issue is that I want to get rid of stuff like this "eeeeeeeeeeeeeeeeeeeeeeeeeeee" .. but then longer of course. Some jokesters find it funny to put that stuff on my site.
So when I have a string like this "Hello how areeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee you doing?" I want to break up the 'areeee'-thing with the zero width space (​) character.
Strings aren't always the same letter, and strings are always inside larger strings.. so str_len, substr, wordwrap all don't really fit the description.
Who can help me out?
Said that this is not a PHP solution, if your problem is the view of your script, why don't you use the simple CSS3 rule called word-wrap?
Let your container is a div with id="example", you can write:
#example
{
word-wrap: break-word;
}
Do this in 3 steps
do a split on the string and whitespace
do a str_len/trim on each word in the string
concat the string back together
The downside to this would be that words longer than 10 chars would be broken as well. So I would suggest adding some stuff in here to see if it is the same letter in a row over and over.
EXAMPLE
$string = "Hello how areeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee you doing?";
$strArr = explode(" ",$string);
foreach($strArr as $word) {
if(strlen($word) > 10) {
$word = substr($word,0,10);
}
$wordArr[] = $word;
}
$newString = implode(" ",$wordArr);
print $newString; // Prints "Hello how areeeeeeee you doing?"

keep HTMLformat after replace some text (using PHP and JS)

I would like modify HTML like
I am <b>Sadi, novice</b> programmer.
to
I am <b>Sadi, learner</b> programmer.
To do it I will search using a string "novice programmer". How can I do it please? Any idea?
It search using more than one word "novice programmer". It could be a whole sentence. The extra white space (e.g. new line, tab) should be ignored and any tag must be ignored during the search. But during the replacement tag must be preserved.
It is a sort of converter. It will be better if it is case insensitive.
Thank you
Sadi
More clarification:
I get some nice reply with possible solution. But please keep posting if you have any idea in mind.
I would like to more clarify the problem just in case anyone missed it. Main post shows the problem as an example scenario.
1) Now the problem is find and replace some string without considering the tags. The tags may shows up within a single word. String may contain multiple word. Tag only appear in the content string or the document. The search phrase never contain any tags.
We can easily remove all tags and do some text operation. But here the another problem shows up.
2) The tags must be preserve, even after replacing the text. That is what the example shows.
Thank you Again for helping
ok i think this is what you want. it takes your input search and replace, splits them into arrays of strings delimited by space, generates a regexp that finds the input sentence with any number of whitespace/html tags, and replaces it with the replacement sentence with the same tags replaced between the words.
if the wordcount of the search sentence is higher than that of the replacement, it just uses spaces between any extra words, and if the replacement wordcount is higher than the search, it will add all 'orphaned' tags on the end. it also handles regexp chars in the find and replace.
<?php
function htmlFriendlySearchAndReplace($find, $replace, $subject) {
$findWords = explode(" ", $find);
$replaceWords = explode(" ", $replace);
$findRegexp = "/";
for ($i = 0; $i < count($findWords); $i++) {
$findRegexp .= preg_replace("/([\\$\\^\\|\\.\\+\\*\\?\\(\\)\\[\\]\\{\\}\\\\\\-])/", "\\\\$1", $findWords[$i]);
if ($i < count($findWords) - 1) {
$findRegexp .= "(\s?(?:<[^>]*>)?\s(?:<[^>]*>)?)";
}
}
$findRegexp .= "/i";
$replaceRegexp = "";
for ($i = 0; $i < count($findWords) || $i < count($replaceWords); $i++) {
if ($i < count($replaceWords)) {
$replaceRegexp .= str_replace("$", "\\$", $replaceWords[$i]);
}
if ($i < count($findWords) - 1) {
$replaceRegexp .= "$" . ($i + 1);
} else {
if ($i < count($replaceWords) - 1) {
$replaceRegexp .= " ";
}
}
}
return preg_replace($findRegexp, $replaceRegexp, $subject);
}
?>
here are the results of a few tests :
Original : <b>Novice Programmer</b>
Search : Novice Programmer
Replace : Advanced Programmer
Result : <b>Advanced Programmer</b>
Original : Hi, <b>Novice Programmer</b>
Search : Novice Programmer
Replace : Advanced Programmer
Result : Hi, <b>Advanced Programmer</b>
Original : I am not a <b>Novice</b> Programmer
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a <b>Advanced</b> Programmer
Original : Novice <b>Programmer</b> in the house
Search : Novice Programmer
Replace : Advanced Programmer
Result : Advanced <b>Programmer</b> in the house
Original : <i>I am not a <b>Novice</b> Programmer</i>
Search : Novice Programmer
Replace : Advanced Programmer
Result : <i>I am not a <b>Advanced</b> Programmer</i>
Original : I am not a <b><i>Novice</i> Programmer</b> any more
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a <b><i>Advanced</i> Programmer</b> any more
Original : I am not a <b><i>Novice</i></b> Programmer any more
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a <b><i>Advanced</i></b> Programmer any more
Original : I am not a Novice<b> <i> </i></b> Programmer any more
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a Advanced<b> <i> </i></b> Programmer any more
Original : I am not a Novice <b><i> </i></b> Programmer any more
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a Advanced <b><i> </i></b> Programmer any more
Original : <i>I am a <b>Novice</b> Programmer</i> too, now
Search : Novice Programmer too
Replace : Advanced Programmer
Result : <i>I am a <b>Advanced</b> Programmer</i> , now
Original : <i>I am a <b>Novice</b> Programmer</i>, now
Search : Novice Programmer
Replace : Advanced Programmer Too
Result : <i>I am a <b>Advanced</b> Programmer Too</i>, now
Original : <i>I make <b>No money</b>, now</i>
Search : No money
Replace : Mucho$1 Dollar$
Result : <i>I make <b>Mucho$1 Dollar$</b>, now</i>
Original : <i>I like regexp, you can do [A-Z]</i>
Search : [A-Z]
Replace : [Z-A]
Result : <i>I like regexp, you can do [Z-A]</i>
I would do this:
if (preg_match('/(.*)novice((?:<.*>)?\s(?:<.*>)?programmer.*)/',$inString,$attributes) {
$inString = $attributes[1].'learner'.$attributes[2];
}
It should match any of the following:
novice programmer
novice</b> programmer
novice </b>programmer
novice<span> programmer
A test version of what the regex states would be something like: Match any set of characters until you reach "novice" and put it into a capturing group, then maybe match something that starts with a '<' and has any number of characters after it and then ends with '>' (but don't capture it), but then there only match something with a white space and then maybe match again something that starts with a '<' and has any number of characters after it and then ends with '>' (but don't capture it) which must then be followed by programmer followed by any number of characters and put that into a capture group.
I would do some specific testing though, as I may have missed some stuff. Regex is a programmers best friend!
Well, there might be a better way, but off the top of my head (assuming that tags won't appear in the middle of words, HTML is well-formed, etc.)...
Essentially, you'll need three things (sorry if this sounds patronising, not intended that way):
1. A method of sub-string matching that ignores tags.
2. A way of making the replacement preserving the tags.
3. A way of putting it all together.
1 - This is probably the most difficult bit. One method would be to iterate through all of the characters in the source string (strings are basically arrays of characters so you can access the characters as if they are array elements), attempting to match as many characters as possible from the search string, stopping when you've either matched all of the characters or run out of characters to match. Any characters between and including '<' and '>' should be ignored. Some pseudo-code (check this over, it's late and there may be mistakes):
findMatch(startingPos : integer, subject : string, searchString : string){
//Variables for keeping track of characters matched, positions, etc.
inTag = false;
matchFound = false;
matchedCharacters = 0;
matchStart = 0;
matchEnd = 0;
for(i from startingPos to length(searchString)){
//Work out when entering or exiting tags, ignore tag contents
if(subject[i] == '<' || subject[i] == '>'){
inTag = !inTag;
}
else if(!inTag){
//Check if the character matches expected in search string
if(subject[i] == searchString[matchedCharacters]){
if(!matchFound){
matchFound = true;
matchStart = i;
}
matchedCharacters++;
//If all of the characters have been matched, return the start and end positions of the substring
if(matchedCharacters + 1 == length(searchString)){
matchEnd = i - matchStart;
return matchStart, matchEnd;
}
}
else{
//Reset counts if not found
matchFound = false;
matchCharacters = 0;
}
}
}
//If no full matches were found, return error
return -1;
}
2 - Split the HTML source code into three strings - the bit you want to work on (between the two positions returned by the matching function) and the part before and after. Split up the bit you want to modify using, for example:
$parts = preg_split("/(<[^>]*>)/",$string, -1, PREG_SPLIT_DELIM_CAPTURE);
Keep a record of where the tags are, concatenate the non-tag segments and perform substring replace on this as normal, then split the modified string up again and reassemble with the tags in place.
3 - This is the easy part, just concatenate the modified part and the other two bits back together.
I may have horribly over complicated this mind, if so just ignore me.
Unless cOm's already written it, the regex would be the best way to go:
$cleaned_string = preg_replace('/\<.\>/', $raw_text, "");
Or something like that. I would need to research/test the regex.
Then you can just use a simple $foobar = str_replace($find, $replace_with, $cleaned_string); to find the text you want to replace.
Didn't realize he wanted to put the HTML back in. It's all regex for that, and more than I know at the moment.
Knowing what I do know, technique-wise I would probably use an expression that didn't ignore whitespace between the words, but did between the < and > brackets, then use the variable-containing abilities of regex to output.
Interesting problem.
I would use the DOM and XPath to find the closest nodes containing that text and then use substring matching to find out which bit of the string is in what node. That will involve character-per-character matching and possible backtracking, though.
Here is the first part, finding the container nodes:
<?php
error_reporting(E_ALL);
header('Content-Type: text/plain; charset=UTF-8');
$doc = new DOMDocument();
$doc->loadHTML(<<<EOD
<p>
<span>
<i>
I am <b>Sadi, novice</b> programmer.
</i>
</span>
</p>
<ul>
<li>
<div>
I am <em>Cornholio, novice</em> programmer of television shows.
</div>
</li>
</ul>
EOD
);
$xpath = new DOMXPath($doc);
// First, get a list of all nodes containing the text anywhere in their tree.
$nodeList = $xpath->evaluate('//*[contains(string(.), "programmer")]');
$deepestNodes = array();
// Now only keep the deepest nodes, because the XPath query will also return HTML, BODY, ...
foreach ($nodeList as $node) {
$deepestNodes[] = $node;
$ancestor = $node;
while (($ancestor = $ancestor->parentNode) && ($ancestor instanceof DOMElement)) {
$deepestNodes = array_filter($deepestNodes, function ($existingNode) use ($ancestor) {
return ($ancestor !== $existingNode);
});
}
}
foreach ($deepestNodes as $node) {
var_dump($node->tagName);
}
I hope that helps you along.
Since you didn't give exact specifics on what you will use this for, I will use your example of "I am sadi, novice programmer".
$before = 'I am <b>sadi, novice</b> programmer';
$after = preg_replace ('/I am (<.*>)?(.*), novice(<.*>)? programmer/','/I am $1$2, learner$3 programmer/',$string);
Alternatively, for any text:
$string = '<b>Hello</b>, world!';
$orig = 'Hello';
$replace = 'Goodbye';
$pattern = "/(<.*>)?$orig(<.*>)?/";
$final = "/$1$replace$2/";
$result = preg_replace($pattern,$final,$string);
//$result should now be 'Goodbye, world!'
Hope that helped. :d
Edit: An example of your example, with the second piece of code:
$string = 'I am sadi, novice programmer.';
$orig = 'novice';
$replace = 'learner';
$pattern = "/(<.>)?$orig(<.>)?/";
$final = "$1$replace$2";
$result = htmlspecialchars(preg_replace($pattern,$final,$string));
echo $result;
The only problem is if you were searching for something that was more than a word long.
Edit 2: Finally came up with a way to do it across multiple words. Here's the code:
function htmlreplace($string,$orig,$replace)
{
$orig = explode(' ',$orig);
$replace = explode(' ',$replace);
$result = $string;
while (count($orig)>0)
{
$shift = array_shift($orig);
$rshift = array_shift($replace);
$pattern = "/$shift\s?(<.*>)?/";
$replacement = "$rshift$1";
$result = preg_replace($pattern,$replacement,$result);
}
$result .= implode(' ',$replace);
return $result;
}
Have fun! :d

Add a word to the ending of every third sentence

I'm creating a "fun-translator", and I'm trying to add a word to the end of every third sentence or so.
It gets another page HTML code and translate it into teen language. But I want to add a word to every third sentence. I've been using this line for now:
$str = preg_replace_callback('{<.*?[^>]*>([æøåÆØÅ !,\w\d\-\(\)]+)([<|\s|!|\.|:])</.*?>}',
"assIt", $str);
But it does only add the word when the sentence is surrounded by HTML code.
I thougt that I could find every sentence by checking for a big letter and then find a puncation, but I really don't know regular expression to well.
Anyone knows how I can get it to work?
A little bit longer, but instead of regexp, you can use explode() function.
$sentences = explode('.', $str);
$numberOfSentences = count($sentences);
for($i = 0; $i < $numberOfSentences; $i++)
{
if($i%3 == 2) {
$sentences[$i] = $sentences[$i] . ' some fun string';
}
}
echo implode('.', $sentences);
This should do it

php - why does this regex truncate my string to zero length?

Yesterday I tracked down a strange bug which caused a website display only a white page - no content on it, no error message visible.
I found that a regular expression used in preg_replace was the problem.
I used the regex in order to replace the title html tag in the accumulated content just before echo´ing the html. The html got rather large on the page where the bug occured (60 kb - not too large) and it seemed like preg_replace / the regex used can only handle a string of certain length - or my regex is really messed up (also possible).
Look at this sample program which reproduces the problem (tested on PHP 5.2.9):
function replaceTitleTagInHtmlSource($content, $replaceWith) {
return preg_replace('#(<title>)([\s\S]+)(<\/title>)#i', '$1'.$replaceWith.'$3', $content);
}
$dummyStr = str_repeat('A', 6000);
$totalStr = '<title>foo</title>';
for($i = 0; $i < 10; $i++) {
$totalStr .= $dummyStr;
}
print 'orignal: ' . strlen($totalStr);
print '<hr />';
$replaced = replaceTitleTagInHtmlSource($totalStr, 'bar');
print 'replaced: ' . strlen($replaced);
print '<hr />';
Output:
orignal: 60018
replaced: 0
So - the function gets a string of length 60000 and returns a string with 0 length. Not what I wanted to do with my regex.
Changing
for($i = 0; $i < 10; $i++) {
to
for($i = 0; $i < 1; $i++) {
in order to decrease the total string length, the output is:
orignal: 6018
replaced: 6018
When I removed the replacing, the content of the page was displayed without any problems.
It seems like you're running into the backtracking limit.
This is confirmed if you print preg_last_error(): it returns PREG_BACKTRACK_LIMIT_ERROR.
You can either increase the limit in your ini file or using ini_set() or change your regular expression from ([\s\S]+) to .*?, which will stop it from backtracking so much.
It thas been said many times before on SO, eg Regex to match the first ending HTMl tag (and probably will be mentioned again) that regexes are not appropriate for HTML because tags are too irregular.
Use DOM functions where they're available.
Backtracking: [\s\S]+ will match ALL available characters, then go backwards through the string looking for the </title>. [^<]+ matches all characters that aren't < and therefore grabs </title> faster.
function replaceTitleTagInHtmlSource($content, $replaceWith) {
return preg_replace('#(<title>)([^<]+)(</title>)#i', '$1'.$replaceWith.'$3', $content);
}
Your regex seems to be a little funny.
([\s\S]+) matches all space and non-space. you should try (.*?) instead.
changing your function works for me:
function replaceTitleTagInHtmlSource($content, $replaceWith) {
return preg_replace('`\<title\>(.*?)\<\/title\>`i', '<title>'.$replaceWith.'</title>', $content);
}
and the problem seems to be you trying to use $1 and $3 to match and

Categories