I'm creating a "fun-translator", and I'm trying to add a word to the end of every third sentence or so.
It gets another page HTML code and translate it into teen language. But I want to add a word to every third sentence. I've been using this line for now:
$str = preg_replace_callback('{<.*?[^>]*>([æøåÆØÅ !,\w\d\-\(\)]+)([<|\s|!|\.|:])</.*?>}',
"assIt", $str);
But it does only add the word when the sentence is surrounded by HTML code.
I thougt that I could find every sentence by checking for a big letter and then find a puncation, but I really don't know regular expression to well.
Anyone knows how I can get it to work?
A little bit longer, but instead of regexp, you can use explode() function.
$sentences = explode('.', $str);
$numberOfSentences = count($sentences);
for($i = 0; $i < $numberOfSentences; $i++)
{
if($i%3 == 2) {
$sentences[$i] = $sentences[$i] . ' some fun string';
}
}
echo implode('.', $sentences);
This should do it
Related
Here is my code:
function TranslatedTitle($Title) {
ConnectWithMySQLDatabase();
$v = mysql_query("SELECT * FROM `ProductTranslations`");
while($vrowis = mysql_fetch_array($v)){
$English[] = $vrowis['English'];
$Bulgarian[] = $vrowis['Bulgarian'];
}
$TranslatedTitle = str_replace($English, $Bulgarian, $Title);
return $TranslatedTitle;
}
I am using this code to fetch data from MySQL table and then search for certain phrase in English and then replace it with the phrase setted to replace the English one with the Bulgarian one.
Example:
I have very big blue eyes.
Will be translated to:
I have very големи сини eyes . It takes the phrase big blue and replace it with големи сини at the position where it can be found.
In other words how can i make the replaced part to be moved in the beginning of the string giving final result by my example as големи сини I have very eyes.
The sentence in the example have no meaning but i have created it as an example.
I would try looping through the $English array and when finding the matching word move it to the beginning, then translating... something like:
foreach($English as $word){
$pos = strpos($Title, $word);
if ($pos !== false) {
//english word found
$Title = $word . str_replace($English, '', $Title);
break;
}
}
Then
$TranslatedTitle = str_replace($English, $Bulgarian, $Title);
First off, you will want to use PDO to interact with your database. mysql_ extensions are now deprecated, bad practice and vulnerable to sql injections. You can manipulate your strings using strpos see php.net/manual/en/function.strpos.php. You will want to first go like this: find the text to replace, translate, remove the word from where ever it is by using $strip = str_replace("",$word) and finally append your result to a new variable ike this $variable = $translate.$strip . Hope that helps
Basically what I am trying to do here is get a text input (a paragraph), and then save each word into an array. Then I want to check each word in the array against the original paragraph to see how many times it occurred. By doing this I am hopefully going to be able to check what the topic is. Originally I started this is as an open ended school project, but I am more interested in finding out how to do this for my own sanity.
Here is my code (this is after I requested the text input in html code above):
$paragraph = $_POST['text'];
$paragraph = str_replace(' ',' ',$paragraph);
$paragraph = str_replace(' ',' ',$paragraph);
$paragraph = strtolower($paragraph);
$words = explode(" ",$paragraph);
$count = count($words);
for($x = 0; $x < $count; $x++) {
echo $words[$x];
echo "<br/>";
}
So far I have been able to get the words all lowercase and to replace all the extra spaces in my text, and then subsequently save that to an array. For now I am just displaying the words.
This is where I have run into some problems. I was thinking I could have a multidimensional array where it would be something along the lines of
$words[1]["word"][0]["amount"];
The word would be the actual word in the paragraph, and amount would count how many times it showed up in the paragraph. If anyone has basic concepts for doing this, or there is something I am missing here I would appreciate your help. The main thing I need help with is checking the amount of times each word shows up in the paragraph. I couldn't get this to work (it was within the prior for loop):
substr_count($words[$x],$paragraph)
To recap, I am trying to take a paragraph, save each different word into an array (I have managed to do this successfully) and then save the amount of times the word shows up in the paragraph into a different array (or a multidimensional array). Once I get this data I am going to see which words I used the most, while filtering out filler words like "the" and "a".
You would be better off using preg_replace('/\W+/', ' ', $paragraph); and simplifying the rest of your code to this:
$paragraph = preg_replace('/\W+/', ' ', $paragraph);
$filter = array('the', 'a');
$words = explode(' ',$paragraph);
$countWords = array();
foreach($words as $w)
{
if(trim($w) != "" && array_search($w, $filter) === false)
{
if(!isset($countWords[$w]))
$countWords[$w] = 0;
$countWords[$w] += 1;
}
}
This will give you how many times each word is used. And if you don't care about case, then you can use $countWords[strtolower($w)] instead. Also, with the $filter array I added, you can add whatever words that you don't want to count in there.
I have a string that is formatted by symbols (symbols similar to those used to format questions on this site).
Rules:
**Hello** means bold = < b>Hello< /b>
*Hello\\ means bulleted list = < li>Hello< /li>
Hello\\ means line break = Hello< br>
I want to replace:
Every first occurence of ** with < b> and every second ** with < /b>.
The same for * with < li> and \\ with < /li>.
All \\ that occur without a * somewhere in the string before, should be converted to < br>.
Example string:
$myString = 'Hello my **Friend**,\\here is the stuff you need to buy for me:*knife\\*water bottle\\***fake ID**\\\\\\Thank you in advance and don not forget the **fake ID**!\\Sincerely yours\\Eddy'
Note: This style is not my invention. It is in use and I have to convert it.
I preg_match()-ed parts of it to get the stuff between the tags.
$myString = 'Hello my **Friend**,\\here is the stuff you need to buy for me:*knife\\*water bottle\\***fake ID**\\\\\\Thank you in advance and don not forget the **fake ID**!\\Sincerely yours\\Eddy';
$result = array();
$firstBold = '<b>'. preg_match('~\*\*(.*?)\*\*~', $myString, $firstBold) . </b>;
$result += $firstBold
// and so on...
(Ignore mistakes in this, its written from memory)
I didn't consider the words before the first bold, but it's basically the same.
This will get the job done at the end but it seems cumbersome to me. I am in search for a more elegant way to do this.
What is the best way to solve this in PHP?
You can use preg_replace. because of your markup your order of replacement will matter.
http://php.net/manual/en/function.preg-replace.php
$myString = preg_replace("/[*][*]([^*]+)[*][*]/",'<b>${1}</b>',$myString);
$myString = preg_replace("/[*]([^\/]+)[\/][\/]/",'<li>${1}</li>',$myString);
$myString = str_replace("//",'<br/>',$myString);
This function searches for words (from the $words array) inside a text and highlights them.
function highlightWords(Array $words, $text){ // Loop through array of words
foreach($words as $word){ // Highlight word inside original text
$text = str_replace($word, '<span class="highlighted">' . $word . '</span>', $text);
}
return $text; // Return modified text
}
Here is the problem:
Lets say the $words = array("car", "drive");
Is there a way for the function to highlight not only the word car, but also words which contain the letters "car" like: cars, carmania, etc.
Thank you!
What you want is a regular expression, preg_replace or peg_replace_callback more in particular (callback in your case would be recommended)
<?php
$searchString = "The car is driving in the carpark, he's not holding to the right lane.\n";
// define your word list
$toHighlight = array("car","lane");
Because you need a regular expression to search your words and you might want or need variation or changes over time, it's bad practice to hard code it into your search words. Hence it's best to walk over the array with array_map and transform the searchword into the proper regular expression (here just enclosing it with / and adding the "accept everything until punctuation" expression)
$searchFor = array_map('addRegEx',$toHighlight);
// add the regEx to each word, this way you can adapt it without having to correct it everywhere
function addRegEx($word){
return "/" . $word . '[^ ,\,,.,?,\.]*/';
}
Next you wish to replace the word you found with your highlighted version, which means you need a dynamic change: use preg_replace_callback instead of regular preg_replace so that it calls a function for every match it find and uses it to generate the proper result. Here we enclose the found word in its span tags
function highlight($word){
return "<span class='highlight'>$word[0]</span>";
}
$result = preg_replace_callback($searchFor,'highlight',$searchString);
print $result;
yields
The <span class='highlight'>car</span> is driving in the <span class='highlight'>carpark</span>, he's not holding to the right <span class='highlight'>lane</span>.
So just paste these code fragments after the other to get the working code, obviously. ;)
edit: the complete code below was altered a bit = placed in routines for easy use by original requester. + case insensitivity
complete code:
<?php
$searchString = "The car is driving in the carpark, he's not holding to the right lane.\n";
$toHighlight = array("car","lane");
$result = customHighlights($searchString,$toHighlight);
print $result;
// add the regEx to each word, this way you can adapt it without having to correct it everywhere
function addRegEx($word){
return "/" . $word . '[^ ,\,,.,?,\.]*/i';
}
function highlight($word){
return "<span class='highlight'>$word[0]</span>";
}
function customHighlights($searchString,$toHighlight){
// define your word list
$searchFor = array_map('addRegEx',$toHighlight);
$result = preg_replace_callback($searchFor,'highlight',$searchString);
return $result;
}
I haven't tested it, but I think this should do it:-
$text = preg_replace('/\W((^\W)?$word(^\W)?)\W/', '<span class="highlighted">' . $1 . '</span>', $text);
This looks for the string inside a complete bounded word and then puts the span around the whole lot using preg_replace and regular expressions.
function replace($format, $string, array $words)
{
foreach ($words as $word) {
$string = \preg_replace(
sprintf('#\b(?<string>[^\s]*%s[^\s]*)\b#i', \preg_quote($word, '#')),
\sprintf($format, '$1'), $string);
}
return $string;
}
// courtesy of http://slipsum.com/#.T8PmfdVuBcE
$string = "Now that we know who you are, I know who I am. I'm not a mistake! It
all makes sense! In a comic, you know how you can tell who the arch-villain's
going to be? He's the exact opposite of the hero. And most times they're friends,
like you and me! I should've known way back when... You know why, David? Because
of the kids. They called me Mr Glass.";
echo \replace('<span class="red">%s</span>', $string, [
'mistake',
'villain',
'when',
'Mr Glass',
]);
Sine it's using an sprintf format for the surrounding string, you can change your replacement accordingly.
Excuse the 5.4 syntax
I would like modify HTML like
I am <b>Sadi, novice</b> programmer.
to
I am <b>Sadi, learner</b> programmer.
To do it I will search using a string "novice programmer". How can I do it please? Any idea?
It search using more than one word "novice programmer". It could be a whole sentence. The extra white space (e.g. new line, tab) should be ignored and any tag must be ignored during the search. But during the replacement tag must be preserved.
It is a sort of converter. It will be better if it is case insensitive.
Thank you
Sadi
More clarification:
I get some nice reply with possible solution. But please keep posting if you have any idea in mind.
I would like to more clarify the problem just in case anyone missed it. Main post shows the problem as an example scenario.
1) Now the problem is find and replace some string without considering the tags. The tags may shows up within a single word. String may contain multiple word. Tag only appear in the content string or the document. The search phrase never contain any tags.
We can easily remove all tags and do some text operation. But here the another problem shows up.
2) The tags must be preserve, even after replacing the text. That is what the example shows.
Thank you Again for helping
ok i think this is what you want. it takes your input search and replace, splits them into arrays of strings delimited by space, generates a regexp that finds the input sentence with any number of whitespace/html tags, and replaces it with the replacement sentence with the same tags replaced between the words.
if the wordcount of the search sentence is higher than that of the replacement, it just uses spaces between any extra words, and if the replacement wordcount is higher than the search, it will add all 'orphaned' tags on the end. it also handles regexp chars in the find and replace.
<?php
function htmlFriendlySearchAndReplace($find, $replace, $subject) {
$findWords = explode(" ", $find);
$replaceWords = explode(" ", $replace);
$findRegexp = "/";
for ($i = 0; $i < count($findWords); $i++) {
$findRegexp .= preg_replace("/([\\$\\^\\|\\.\\+\\*\\?\\(\\)\\[\\]\\{\\}\\\\\\-])/", "\\\\$1", $findWords[$i]);
if ($i < count($findWords) - 1) {
$findRegexp .= "(\s?(?:<[^>]*>)?\s(?:<[^>]*>)?)";
}
}
$findRegexp .= "/i";
$replaceRegexp = "";
for ($i = 0; $i < count($findWords) || $i < count($replaceWords); $i++) {
if ($i < count($replaceWords)) {
$replaceRegexp .= str_replace("$", "\\$", $replaceWords[$i]);
}
if ($i < count($findWords) - 1) {
$replaceRegexp .= "$" . ($i + 1);
} else {
if ($i < count($replaceWords) - 1) {
$replaceRegexp .= " ";
}
}
}
return preg_replace($findRegexp, $replaceRegexp, $subject);
}
?>
here are the results of a few tests :
Original : <b>Novice Programmer</b>
Search : Novice Programmer
Replace : Advanced Programmer
Result : <b>Advanced Programmer</b>
Original : Hi, <b>Novice Programmer</b>
Search : Novice Programmer
Replace : Advanced Programmer
Result : Hi, <b>Advanced Programmer</b>
Original : I am not a <b>Novice</b> Programmer
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a <b>Advanced</b> Programmer
Original : Novice <b>Programmer</b> in the house
Search : Novice Programmer
Replace : Advanced Programmer
Result : Advanced <b>Programmer</b> in the house
Original : <i>I am not a <b>Novice</b> Programmer</i>
Search : Novice Programmer
Replace : Advanced Programmer
Result : <i>I am not a <b>Advanced</b> Programmer</i>
Original : I am not a <b><i>Novice</i> Programmer</b> any more
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a <b><i>Advanced</i> Programmer</b> any more
Original : I am not a <b><i>Novice</i></b> Programmer any more
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a <b><i>Advanced</i></b> Programmer any more
Original : I am not a Novice<b> <i> </i></b> Programmer any more
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a Advanced<b> <i> </i></b> Programmer any more
Original : I am not a Novice <b><i> </i></b> Programmer any more
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a Advanced <b><i> </i></b> Programmer any more
Original : <i>I am a <b>Novice</b> Programmer</i> too, now
Search : Novice Programmer too
Replace : Advanced Programmer
Result : <i>I am a <b>Advanced</b> Programmer</i> , now
Original : <i>I am a <b>Novice</b> Programmer</i>, now
Search : Novice Programmer
Replace : Advanced Programmer Too
Result : <i>I am a <b>Advanced</b> Programmer Too</i>, now
Original : <i>I make <b>No money</b>, now</i>
Search : No money
Replace : Mucho$1 Dollar$
Result : <i>I make <b>Mucho$1 Dollar$</b>, now</i>
Original : <i>I like regexp, you can do [A-Z]</i>
Search : [A-Z]
Replace : [Z-A]
Result : <i>I like regexp, you can do [Z-A]</i>
I would do this:
if (preg_match('/(.*)novice((?:<.*>)?\s(?:<.*>)?programmer.*)/',$inString,$attributes) {
$inString = $attributes[1].'learner'.$attributes[2];
}
It should match any of the following:
novice programmer
novice</b> programmer
novice </b>programmer
novice<span> programmer
A test version of what the regex states would be something like: Match any set of characters until you reach "novice" and put it into a capturing group, then maybe match something that starts with a '<' and has any number of characters after it and then ends with '>' (but don't capture it), but then there only match something with a white space and then maybe match again something that starts with a '<' and has any number of characters after it and then ends with '>' (but don't capture it) which must then be followed by programmer followed by any number of characters and put that into a capture group.
I would do some specific testing though, as I may have missed some stuff. Regex is a programmers best friend!
Well, there might be a better way, but off the top of my head (assuming that tags won't appear in the middle of words, HTML is well-formed, etc.)...
Essentially, you'll need three things (sorry if this sounds patronising, not intended that way):
1. A method of sub-string matching that ignores tags.
2. A way of making the replacement preserving the tags.
3. A way of putting it all together.
1 - This is probably the most difficult bit. One method would be to iterate through all of the characters in the source string (strings are basically arrays of characters so you can access the characters as if they are array elements), attempting to match as many characters as possible from the search string, stopping when you've either matched all of the characters or run out of characters to match. Any characters between and including '<' and '>' should be ignored. Some pseudo-code (check this over, it's late and there may be mistakes):
findMatch(startingPos : integer, subject : string, searchString : string){
//Variables for keeping track of characters matched, positions, etc.
inTag = false;
matchFound = false;
matchedCharacters = 0;
matchStart = 0;
matchEnd = 0;
for(i from startingPos to length(searchString)){
//Work out when entering or exiting tags, ignore tag contents
if(subject[i] == '<' || subject[i] == '>'){
inTag = !inTag;
}
else if(!inTag){
//Check if the character matches expected in search string
if(subject[i] == searchString[matchedCharacters]){
if(!matchFound){
matchFound = true;
matchStart = i;
}
matchedCharacters++;
//If all of the characters have been matched, return the start and end positions of the substring
if(matchedCharacters + 1 == length(searchString)){
matchEnd = i - matchStart;
return matchStart, matchEnd;
}
}
else{
//Reset counts if not found
matchFound = false;
matchCharacters = 0;
}
}
}
//If no full matches were found, return error
return -1;
}
2 - Split the HTML source code into three strings - the bit you want to work on (between the two positions returned by the matching function) and the part before and after. Split up the bit you want to modify using, for example:
$parts = preg_split("/(<[^>]*>)/",$string, -1, PREG_SPLIT_DELIM_CAPTURE);
Keep a record of where the tags are, concatenate the non-tag segments and perform substring replace on this as normal, then split the modified string up again and reassemble with the tags in place.
3 - This is the easy part, just concatenate the modified part and the other two bits back together.
I may have horribly over complicated this mind, if so just ignore me.
Unless cOm's already written it, the regex would be the best way to go:
$cleaned_string = preg_replace('/\<.\>/', $raw_text, "");
Or something like that. I would need to research/test the regex.
Then you can just use a simple $foobar = str_replace($find, $replace_with, $cleaned_string); to find the text you want to replace.
Didn't realize he wanted to put the HTML back in. It's all regex for that, and more than I know at the moment.
Knowing what I do know, technique-wise I would probably use an expression that didn't ignore whitespace between the words, but did between the < and > brackets, then use the variable-containing abilities of regex to output.
Interesting problem.
I would use the DOM and XPath to find the closest nodes containing that text and then use substring matching to find out which bit of the string is in what node. That will involve character-per-character matching and possible backtracking, though.
Here is the first part, finding the container nodes:
<?php
error_reporting(E_ALL);
header('Content-Type: text/plain; charset=UTF-8');
$doc = new DOMDocument();
$doc->loadHTML(<<<EOD
<p>
<span>
<i>
I am <b>Sadi, novice</b> programmer.
</i>
</span>
</p>
<ul>
<li>
<div>
I am <em>Cornholio, novice</em> programmer of television shows.
</div>
</li>
</ul>
EOD
);
$xpath = new DOMXPath($doc);
// First, get a list of all nodes containing the text anywhere in their tree.
$nodeList = $xpath->evaluate('//*[contains(string(.), "programmer")]');
$deepestNodes = array();
// Now only keep the deepest nodes, because the XPath query will also return HTML, BODY, ...
foreach ($nodeList as $node) {
$deepestNodes[] = $node;
$ancestor = $node;
while (($ancestor = $ancestor->parentNode) && ($ancestor instanceof DOMElement)) {
$deepestNodes = array_filter($deepestNodes, function ($existingNode) use ($ancestor) {
return ($ancestor !== $existingNode);
});
}
}
foreach ($deepestNodes as $node) {
var_dump($node->tagName);
}
I hope that helps you along.
Since you didn't give exact specifics on what you will use this for, I will use your example of "I am sadi, novice programmer".
$before = 'I am <b>sadi, novice</b> programmer';
$after = preg_replace ('/I am (<.*>)?(.*), novice(<.*>)? programmer/','/I am $1$2, learner$3 programmer/',$string);
Alternatively, for any text:
$string = '<b>Hello</b>, world!';
$orig = 'Hello';
$replace = 'Goodbye';
$pattern = "/(<.*>)?$orig(<.*>)?/";
$final = "/$1$replace$2/";
$result = preg_replace($pattern,$final,$string);
//$result should now be 'Goodbye, world!'
Hope that helped. :d
Edit: An example of your example, with the second piece of code:
$string = 'I am sadi, novice programmer.';
$orig = 'novice';
$replace = 'learner';
$pattern = "/(<.>)?$orig(<.>)?/";
$final = "$1$replace$2";
$result = htmlspecialchars(preg_replace($pattern,$final,$string));
echo $result;
The only problem is if you were searching for something that was more than a word long.
Edit 2: Finally came up with a way to do it across multiple words. Here's the code:
function htmlreplace($string,$orig,$replace)
{
$orig = explode(' ',$orig);
$replace = explode(' ',$replace);
$result = $string;
while (count($orig)>0)
{
$shift = array_shift($orig);
$rshift = array_shift($replace);
$pattern = "/$shift\s?(<.*>)?/";
$replacement = "$rshift$1";
$result = preg_replace($pattern,$replacement,$result);
}
$result .= implode(' ',$replace);
return $result;
}
Have fun! :d