<?php
$titledb = array('经济管理','管理','others');
$content='经济管理是我们国的家的中心领导力,这是中文测度。';
$replace='<a target="_blank" href="http://www.a.com/$1">$1</a>';
foreach ($titledb as $title) {
$regex = "~\b(" . preg_quote($title) . ")\b~u";
$content = preg_replace($regex, $replace, $content, 1);
}
echo $content;
?>
I was writing a auto link function for my wordpress site and I'm using substr_replace to find the keywords(which are litterally a lot) and replace it with link--I'm doing this by filtering the post content of course.
But in some circumstances, suppose there are posts with titles like "stackoverflow" and "overflow" it turns out to be a mess, the output will look like :
we love<a target="_blank" href="http://www.a.com/stackoverflow">stackoverflow</a>,this is a test。we love <a target="_blank" href="http://www.a.com/stack<a target=" _blank"="">overflow</a> ">stackoverflow,this is a test。
What I want is:
we love<a target="_blank" href="http://www.a.com/stackoverflow">stackoverflow</a>,this is a test。we love stack<a target="_blank" href="http://www.a.com/overflow">overflow</a>,this is a test。
And this is only a test.The production enviorment could be more complicated,like I said there are tens of thousands of titles as keywords need to be found and replaced with a link. So I see these broken links a lot.It happens when a title contains another title.Like title 'stackoverflow' contains another title 'overflow'.
So my question is how to make substr_replace take title 'stackoverflow' as a whole and replace only once? Of course,'overflow' still needs to be replaced somewhere else just not when it is included in another keyword.
Thank you in advance.
To prevent that a search for a word will start replacing inside the HTML code that you already injected for some other word, you could make use of a temporary placeholder, and do the final replacement on those place holders:
$titledb = array('经济管理','管理','others');
// sort the array from longer strings to smaller strings, to ensure that
// a replacement of a longer string gets precedence:
usort($titledb, function ($a,$b){ return strlen($b)-strlen($a); });
$content='经济管理是我们国的家的中心领导力。';
foreach ($titledb as $index => $title) {
$pos = strpos($content, $title);
if ($pos !== false) {
// temporarily insert a place holder in the format '#number#':
$content = substr_replace($content, "#$index#", $pos, strlen($title));
}
}
// Now replace the place holders with the final hyperlink HTML code
$content = preg_replace_callback("~#(\d+)#~u", function ($match) use ($titledb) {
return "<a target='_blank' href='http://www.a.com/{$titledb[$match[1]]}'>{$titledb[$match[1]]}</a>";
}, $content);
echo $content;
See it run on eval.in
Related
I'm trying to do custom tags for links, colour and bullet points on a website so [l]...[/l] gets replaced by the link inside and [li]...[/li] gets replaced by a bullet point list.
I've got it half working but there's a problem with the link descriptions, heres the code:
// Takes in a paragraph, replaces all square-bracket tags with HTML tags. Calls the getBetweenTags() method to get the text between the square tags
function replaceTags($text)
{
$tags = array("[l]", "[/l]", "[list]", "[/list]", "[li]", "[/li]");
$html = array("<a style='text-decoration:underline;' class='common_link' href='", "'>" . getBetweenTags("[l]", "[/l]", $text) . "</a>", "<ul>", "</ul>", "<li>", "</li>");
return str_replace($tags, $html, $text);
}
// Tages in the start and end tag along with the paragraph, returns the text between the two tags.
function getBetweenTags($tag1, $tag2, $text)
{
$startsAt = strpos($text, $tag1) + strlen($tag1);
$endsAt = strpos($text, $tag2, $startsAt);
return substr($text, $startsAt, $endsAt - $startsAt);
}
The problem I'm having is when I have three links:
[l]http://www.example1.com[/l]
[l]http://www.example2.com[/l]
[l]http://www.example3.com[/l]
The links get replaced as:
http://www.example1.com
http://www.example1.com
http://www.example1.com
They are all hyperlinked correctly i.e. 1,2,3 but the text bit is the same for all links.
You can see it in action here at the bottom of the page with the three random links. How can i change the code to make the proper URL descriptions appear under each link - so each link is properly hyperlinked to the corresponding page with the corresponding text showing that URL?
str_replace does all the grunt work for you. The problem is that:
getBetweenTags("[l]", "[/l]", $text)
doesn't change. It will match 3 times but it just resolves to "http://www.example1.com" because that's the first link on the page.
You can't really do a static replacement, you need to keep at least a pointer to where you are in the input text.
My advise would be to write a simple tokenizer/ parser. It's actually not that hard. The tokenizer can be really simple, find all [ and ] and derive tags. Then your parser will try to make sense of the tokens. Your token stream can look like:
array(
array("string", "foo "),
array("tag", "l"),
array("string", "http://example"),
array("endtag", "l"),
array("string", " bar")
);
Here is how I would use preg_match_all instead personally.
$str='
[l]http://www.example1.com[/l]
[l]http://www.example2.com[/l]
[l]http://www.example3.com[/l]
';
preg_match_all('/\[(l|li|list)\](.+?)(\[\/\1\])/is',$str,$m);
if(isset($m[0][0])){
for($x=0;$x<count($m[0]);$x++){
$str=str_replace($m[0][$x],$m[2][$x],$str);
}
}
print_r($str);
I've been reading various articles and have arrived at some code.
For a single URL on my site http://home.com/example/ (and only that URL - no children) I would like to replace all instances of "<a itemprop="url" with just <a basically stripping out itemprop="url" This is what I have come up with but I'm not sure whether I'm on the right lines and if I am how to 'echo' it on on the basis it's code and not something to be echoed to screen. Also not too sure whether I need to escape the double quotes within the single quotes in $str_replace.
if(preg_match("%/example/$%", $_SERVER['REQUEST_URI'])){
$string = "<a itemprop=\"url\"";
$str_replace = str_replace('<a itemprop="url"','<a',$string);
//something here
}
Please could anyone advise also if I am correct in how I am approaching this what the final part of the code needs to be to run it (I'm assuming not echo $str_replace;. I'll be running it as a function from my Wordpress functions.php file - I'm comfortable with that if it works.
This could be a mess and I apologise if it is.
try strpos()
if(strpos($_SERVER['REQUEST_URI'], "example") !== false){
$string = "<a itemprop=\"url\"";
$str_replace = str_replace('<a itemprop="url"','<a',$string);
}
There must be some kind of template where you get the default html and modify it with the php at some point of your code...
$html_template = file('...adress_of_the_url_template...');
.......
if(strpos($_SERVER['REQUEST_URI'], "example") !== false){
$string = "<a itemprop=\"url\"";
$html_template = str_replace($string,'<a',$html_template);
}
.......
.......
echo $html_template
Then you have replaced the html code as you wanted
It looks like I was over-complicating it because the solution appears to be within Wordpress functions. This is what I've ended up with. Any comments, corrections or recommendations appreciated. I'm not a coder as you may realise...
function schema( $content ) {
if (is_page( 'my-page-slug')) {
return str_replace('<a itemprop="url"', '<a', $content);
}
else return $content;
}
add_filter('the_content', 'schema', 99);
I'm trying something out but as I'm not versed in PHP it feels like smashing my head against random walls.
I need to alter the output of image tags. I want to replace the img tag for a div with a background image, to achieve sort of this effect.
I'm working around this function, in functions.php. This only takes the full image code and outputs it into the background-image url. I would need to extract the SRC.
function turnItIntoDiv( $content ) {
// A regular expression of what to look for.
$pattern = '/(<img([^>]*)>)/i';
// What to replace it with. $1 refers to the content in the first 'capture group', in parentheses above
$replacement = '<div class="full-image" style="background-image: url("$1")"></div>';
// run preg_replace() on the $content
$content = preg_replace( $pattern, $replacement, $content );
// return the processed content
return $content;
}
add_filter( 'the_content', 'turnItIntoDiv' );
Thank you all in advance
Maybe it would by fine to split.
if (preg_match("/img/i",$content))
{
..
$pattern = '/src="([^"]+)"/i';
..
Then you will get exact file name. Height of div should be set.
Whole code - matching all images:
if (preg_match("/img/i",$content))
{
if (preg_match_all('/src[ ]?=[ ]?"([^"]+)"/i',$content,$matches))
{
foreach ($matches[1] as $i => $match)
{
$replacement = '<div class="full-image" style="background-image: url(\'' . trim($match) . '\')"></div>';
$content = preg_replace("~(<img(.*){$matches[0][$i]}[^>]+>)~i",$replacement,$content);
}
}
}
! div height must be set
I have latex + html code somewhere in the following form:
...some text1.... \[latex-code1\]....some text2....\[latex-code2\]....etc
Firstly I want to obtain the latex codes in an array codes[] to be able to send them to a server for rendering, so that
code[0]=latex-code1, code[1]=latex-code2, etc
Secondly, I want to modify this text so that it looks like:
...some text1.... <img src="root/1.png">....some text2....<img src="root/2.png">....etc
i.e, the i-th latex code fragment is replaced by the link to the i-th rendered image.
I have been trying to do this with preg_replace_callback and preg_match_all but being new to PHP haven't been able to make it work. Please advise.
If you're looking for codez:
$html = '...some text1.... \[latex-code1\]....some text2....\[latex-code2\]....etc';
$codes = array();
$count = 0;
$replace = function($matches) use (&$codes, &$count) {
list(, $codes[]) = $matches;
return sprintf('<img src="root/%d.png">', ++$count);
};
$changed = preg_replace_callback('~\\\\\\[(.+?)\\\\\\]~', $replace, $html);
echo "Original: $html\n";
echo "Changed : $changed\n\nLatex Codes: ", print_r($codes, 1), "Count: ", $count;
I don't know at which part you've got the problems, if it's the regex pattern, you use characters inside your markers that needs heavy escaping: For PHP and PCRE, that's why there are so many slashes.
Another tricky part is the callback function because it needs to collect the codes as well as having a counter. It's done in the example with an anonymous function that has variable aliases / references in it's use clause. This makes the variables $codes and $count available inside the callback.
The site I'm working on has a database table filled with glossary terms. I am building a function that will take some HTML and replace the first instances of the glossary terms with tooltip links.
I am running into a problem though. Since it's not just one replace, the function is replacing text that has been inserted in previous iterations, so the HTML is getting mucked up.
I guess the bottom line is, I need to ignore text if it:
Appears within the < and > of any HTML tag, or
Appears within the text of an <a></a> tag.
Here's what I have so far. I was hoping someone out there would have a clever solution.
function insertGlossaryLinks($html)
{
// Get glossary terms from database, once per request
static $terms;
if (is_null($terms)) {
$query = Doctrine_Query::create()
->select('gt.title, gt.alternate_spellings, gt.description')
->from('GlossaryTerm gt');
$glossaryTerms = $query->rows();
// Create whole list in $terms, including alternate spellings
$terms = array();
foreach ($glossaryTerms as $glossaryTerm) {
// Initialize with title
$term = array(
'wordsHtml' => array(
h(trim($glossaryTerm['title']))
),
'descriptionHtml' => h($glossaryTerm['description'])
);
// Add alternate spellings
foreach (explode(',', $glossaryTerm['alternate_spellings']) as $alternateSpelling) {
$alternateSpelling = h(trim($alternateSpelling));
if (empty($alternateSpelling)) {
continue;
}
$term['wordsHtml'][] = $alternateSpelling;
}
$terms[] = $term;
}
}
// Do replacements on this HTML
$newHtml = $html;
foreach ($terms as $term) {
$callback = create_function('$m', 'return \'<span>\'.$m[0].\'</span>\';');
$term['wordsHtmlPreg'] = array_map('preg_quote', $term['wordsHtml']);
$pattern = '/\b('.implode('|', $term['wordsHtmlPreg']).')\b/i';
$newHtml = preg_replace_callback($pattern, $callback, $newHtml, 1);
}
return $newHtml;
}
Using Regexes to process HTML is always risky business. You will spend a long time fiddling with the greediness and laziness of your Regexes to only capture text that is not in a tag, and not in a tag name itself. My recommendation would be to ditch the method you are currently using and parse your HTML with an HTML parser, like this one: http://simplehtmldom.sourceforge.net/. I have used it before and have recommended it to others. It is a much simpler way of dealing with complex HTML.
I ended up using preg_replace_callback to replace all existing links with placeholders. Then I inserted the new glossary term links. Then I put back the links that I had replaced.
It's working great!