I've been reading various articles and have arrived at some code.
For a single URL on my site http://home.com/example/ (and only that URL - no children) I would like to replace all instances of "<a itemprop="url" with just <a basically stripping out itemprop="url" This is what I have come up with but I'm not sure whether I'm on the right lines and if I am how to 'echo' it on on the basis it's code and not something to be echoed to screen. Also not too sure whether I need to escape the double quotes within the single quotes in $str_replace.
if(preg_match("%/example/$%", $_SERVER['REQUEST_URI'])){
$string = "<a itemprop=\"url\"";
$str_replace = str_replace('<a itemprop="url"','<a',$string);
//something here
}
Please could anyone advise also if I am correct in how I am approaching this what the final part of the code needs to be to run it (I'm assuming not echo $str_replace;. I'll be running it as a function from my Wordpress functions.php file - I'm comfortable with that if it works.
This could be a mess and I apologise if it is.
try strpos()
if(strpos($_SERVER['REQUEST_URI'], "example") !== false){
$string = "<a itemprop=\"url\"";
$str_replace = str_replace('<a itemprop="url"','<a',$string);
}
There must be some kind of template where you get the default html and modify it with the php at some point of your code...
$html_template = file('...adress_of_the_url_template...');
.......
if(strpos($_SERVER['REQUEST_URI'], "example") !== false){
$string = "<a itemprop=\"url\"";
$html_template = str_replace($string,'<a',$html_template);
}
.......
.......
echo $html_template
Then you have replaced the html code as you wanted
It looks like I was over-complicating it because the solution appears to be within Wordpress functions. This is what I've ended up with. Any comments, corrections or recommendations appreciated. I'm not a coder as you may realise...
function schema( $content ) {
if (is_page( 'my-page-slug')) {
return str_replace('<a itemprop="url"', '<a', $content);
}
else return $content;
}
add_filter('the_content', 'schema', 99);
Related
<?php
$titledb = array('经济管理','管理','others');
$content='经济管理是我们国的家的中心领导力,这是中文测度。';
$replace='<a target="_blank" href="http://www.a.com/$1">$1</a>';
foreach ($titledb as $title) {
$regex = "~\b(" . preg_quote($title) . ")\b~u";
$content = preg_replace($regex, $replace, $content, 1);
}
echo $content;
?>
I was writing a auto link function for my wordpress site and I'm using substr_replace to find the keywords(which are litterally a lot) and replace it with link--I'm doing this by filtering the post content of course.
But in some circumstances, suppose there are posts with titles like "stackoverflow" and "overflow" it turns out to be a mess, the output will look like :
we love<a target="_blank" href="http://www.a.com/stackoverflow">stackoverflow</a>,this is a test。we love <a target="_blank" href="http://www.a.com/stack<a target=" _blank"="">overflow</a> ">stackoverflow,this is a test。
What I want is:
we love<a target="_blank" href="http://www.a.com/stackoverflow">stackoverflow</a>,this is a test。we love stack<a target="_blank" href="http://www.a.com/overflow">overflow</a>,this is a test。
And this is only a test.The production enviorment could be more complicated,like I said there are tens of thousands of titles as keywords need to be found and replaced with a link. So I see these broken links a lot.It happens when a title contains another title.Like title 'stackoverflow' contains another title 'overflow'.
So my question is how to make substr_replace take title 'stackoverflow' as a whole and replace only once? Of course,'overflow' still needs to be replaced somewhere else just not when it is included in another keyword.
Thank you in advance.
To prevent that a search for a word will start replacing inside the HTML code that you already injected for some other word, you could make use of a temporary placeholder, and do the final replacement on those place holders:
$titledb = array('经济管理','管理','others');
// sort the array from longer strings to smaller strings, to ensure that
// a replacement of a longer string gets precedence:
usort($titledb, function ($a,$b){ return strlen($b)-strlen($a); });
$content='经济管理是我们国的家的中心领导力。';
foreach ($titledb as $index => $title) {
$pos = strpos($content, $title);
if ($pos !== false) {
// temporarily insert a place holder in the format '#number#':
$content = substr_replace($content, "#$index#", $pos, strlen($title));
}
}
// Now replace the place holders with the final hyperlink HTML code
$content = preg_replace_callback("~#(\d+)#~u", function ($match) use ($titledb) {
return "<a target='_blank' href='http://www.a.com/{$titledb[$match[1]]}'>{$titledb[$match[1]]}</a>";
}, $content);
echo $content;
See it run on eval.in
I'm trying to replace a title tag from |title|Page title| to <title>Page Title</title>, using this regular expression. But being a complete amateur, it's not gone to well..
'^|title|^[a-zA-Z0-9_]{1,}|$' => '<title>$1</title>'
I would love to know how to fix it, and more importantly, what I did wrong and why it was wrong.
You almost got it:
You should escape the | characters as they have special meaning in a
regex and you are using it as a plain character.
You should add the space character to your search group
$string = '|title|Page title|';
$pattern = '/\|title\|([a-zA-Z0-9_ ]{1,})\|/';
$replacement = '<title>$1</title>';
echo preg_replace($pattern, $replacement, $string); //echoes <title>Page title</title>
See working demo
OP posted some code in comments which is wrong, try this version:
$regular_expressions = array( array( '/\|title\|([a-zA-Z0-9_ ]{1,})\|/' , '<title>$1</title>' ));
foreach($regular_expressions as $regexp){
$data = preg_replace($regexp[0], $regexp[1], $data);
}
Heres a little function I came up with a while back to essentially scrape the titles of a page when users submitted links through my service. What this function does is will get the contents of a provided URL. Seek a title tag, if found, get whats between the title tag and dump it's result. With a little tweaking I am sure you can use a replace method for whatever your doing, and make it work for your needs. So this is more of a starting point rather than an answer but overall I hope it helps to some extent.
$url = 'http://www.chrishacia.com';
function get_page_title($url){
if( !($data = file_get_contents($url)) ) return false;
if( preg_match("#<title>(.+)<\/title>#iU", $data, $t)) {
return trim($t[1]);
} else {
return false;
}
}
var_dump(get_page_title($url));
<?php
$s = "|title|Page title|";
$s = preg_replace('/^\|title\|([^\|]+)\|/', "<title>$1</title>", $s);
echo $s;
?>
for example i've got a string like this:
$html = '
test
test
test
hi
';
and i want to append the absolute url to all hrefs where no abolute domain is given.
$html = '
test
test
test
hi
';
whats the best way to do that? i guess something with RegEx, but my RegEx skills are ** ;)
thanks in advance!
found a good way :
$html = preg_replace("#(<\s*a\s+[^>]*href\s*=\s*[\"'])(?!http)([^\"'>]+)([\"'>]+)#", '$1http://mydomain.com/$2$3', $html);
you can use (?!http|mailto) if you have also mailto links in your $html
$domain = 'http://mydomain';
preg_match_all('/href\="(.*?)"/im', $html, $matches);
foreach($matches[1] as $n=>$link) {
if(substr($link, 0, 4) != 'http')
$html = str_replace($matches[1][$n], $domain . $matches[1][$n], $html);
}
The previous answer will cause problems with your first and fourth example because it fails to include a forward slash to separate the page from the page name. Admittedly this can be fixed by simply appending it to the $domain, but if you do that then href="/something.php" will end up with two.
Just to give an alternative Regex solution you could go with something like this...
$pattern = '#'#(?<=href=")(.+?)(?=")#'';
$output = preg_replace_callback($pattern, 'make_absolute', $input);
function make_absolute($link) {
$domain = 'http://domain.com';
if(strpos($link[1], 'http')!==0) {
if(strpos($link[1], '/')!==0) {
return $domain.'/'.$link[1];
} else {
return $domain.$link[1];
}
}
return $link[1];
}
However it is worth noting that with a link such as href="example.html" the link is relative to the current directory neither method shown so far will work correctly for relative links that aren't in the root directory. In order to provide a solution that is though more information would be required about where the information came from.
Ahoy there!
I can't "guess" witch syntax should I use to be able to extract the source of an image but simply the web address not the src= neither the quotes?
Here is my piece of code:
function get_all_images_src() {
$content = get_the_content();
preg_match_all('|src="(.*?)"|i', $content, $matches, PREG_SET_ORDER);
foreach($matches as $path) {
echo $path[0];
}
}
When I use it I got this printed:
src="http://project.bechade.fr/wp-content/uploads/2009/09/mer-300x225.jpg"
And I wish to get only this:
http://project.bechade.fr/wp-content/uploads/2009/09/mer-300x225.jpg
Any idea?
Thanks for your help.
Not exactly an answer to your question, but when parsing html, consider using a proper html parser:
foreach($html->find('img') as $element) {
echo $element->src . '<br />';
}
See: http://simplehtmldom.sourceforge.net/
$path[1] instead of $path[0]
echo $path[1];
$path[0] is the full string matched. $path[1] is the first grouping.
You could explode the string using " as a delimeter and then the second item in the array you get would be the right string:
$array = explode('"',$full_src);
$bit_you_want = $array[1];
Reworking your original function, it would be:
function get_all_images_src() {
$content = get_the_content();
preg_match_all('|src="(.*?)"|i', $content, $matches, PREG_SET_ORDER);
foreach($matches as $path) {
$src = explode('"', $path);
echo $src[1];
}
}
Thanks Ithcy for his right answer.
I guess I've been too long to respond because he deleted it, I just don't know where his answer's gone...
So here is the one I've received by mail:
'|src="(.*?)"|i' makes no sense as a
regex. try '|src="([^"]+)"|i' instead.
(Which still isn't the most robust
solution but is better than what
you've got.)
Also, what everyone else said. You
want $path1, NOT $path[0]. You're
already extracting all the src
attributes into $matches[]. That has
nothing to do with $path[0]. If you're
not getting all of the src attributes
in the text, there is a problem
somewhere else in your code.
One more thing - you should use a real
HTML parser for this, because img tags
are not the only tags with src
attributes. If you're using this code
on raw HTML source, it's going to
match not just but
tags, etc.
— ithcy
I did everything he told me to do including using a HTML parser from Bart (2nd answer).
It works like a charm ! Thank you mate...
UPDATE:
Thank you all for your input. Some additional information.
It's really just a small chunk of markup (20 lines) I'm working with and had aimed to to leverage a regex to do the work.
I also do have the ability to hack up the script (an ecommerce one) to insert the classes as the navigation is built. I wanted to limit the number of hacks I have in place to keep things easier on myself when I go to update to the latest version of the software.
With that said, I'm pretty aware of my situation and the various options available to me. The first part of my regex works as expected. I posted really more or less to see if someone would say, "hey dummy, this is easy just change this....."
After coming close with a few of my efforts, it's more of the principle at this point. To just know (and learn) a solution exists for this problem. I also hate being beaten by a piece of code.
ORIGINAL:
I'm trying to leverage regular expressions to add a CSS a class to the first and last list items within an ordered list. I've tried a bunch of different ways but can't produce the results I'm looking for.
I've got a regular expression for the first list item but can't seem to figure a correct one out for the last. Here is what I'm working with:
$patterns = array('/<ul+([^<]*)<li/m', '/<([^<]*)(?<=<li)(.*)<\/ul>/s');
$replace = array('<ul$1<li class="first"','<li class="last"$2$3</ul>');
$navigation = preg_replace($patterns, $replace, $navigation);
Any help would be greatly appreciated.
Jamie Zawinski would have something to say about this...
Do you have a proper HTML parser? I don't know if there's anything like hpricot available for PHP, but that's the right way to deal with it. You could at least employ hpricot to do the first cleanup for you.
If you're actually generating the HTML -- do it there. It looks like you want to generate some navigation and have a .first and .last kind of thing on it. Take a step back and try that.
+1 to generating the right html as the best option.
But a completely different approach, which may or may not be acceptable to you: you could use javascript.
This uses jquery to make it easy ...
$(document).ready(
function() {
$('#id-of-ul:firstChild').addClass('first');
$('#id-of-ul:lastChild').addClass('last');
}
);
As I say, may or may not be any use in this case, but I think its a valid solution to the problem in some cases.
PS: You say ordered list, then give ul in your example. ol = ordered list, ul = unordered list
You wrote:
$patterns = array('/<ul+([^<]*)<li/m','/<([^<]*)(?<=<li)(.*)<\/ul>/s');
First pattern:
ul+ => you search something like ullll...
The m modifier is useless here, since you don't use ^ nor $.
Second pattern:
Using .* along with s is "dangerous", because you might select the whole document up to the last /ul of the page...
And well, I would just drop s modifier and use: (<li\s)(.*?</li>\s*</ul>) with replace: '$1class="last" $2'
In view of above remarks, I would write the first expression: <ul.*?>\s*<li
Although I am tired of seeing the Jamie Zawinski quote each time there is a regex question, Dustin is right in pointing you to a HTML parser (or just generating the right HTML from the start!): regexes and HTML doesn't mix well, because HTML syntax is complex, and unless you act on a well known machine generated output with very predictable result, you are prone to get something breaking in some cases.
I don't know if anyone cares any longer, but I have a solution that works in my simple test case (and I believe it should work in the general case).
First, let me point out two things: While PhiLho is right in that the s is "dangerous", since dots may match everything up to the final of the document, this may very well be what you want. It only becomes a problem with not well formed pages. Be careful with any such regex on large, manually written pages.
Second, php has a special meaning of backslashes, even in single quotes. Most regexen will perform well either way, but you should always double-escape them, just in case.
Now, here's my code:
<?php
$navigation='<ul>
<li>Coffee</li>
<li>Tea</li>
<li>Milk</li>
<li>Beer</li>
<li>Water</li>
</ul>';
$patterns = array('/<ul.*?>\\s*<li/',
'/<li((.(?<!<li))*?<\\/ul>)/s');
$replace = array('$0 class="first"',
'<li class="last"$1');
$navigation = preg_replace($patterns, $replace, $navigation);
echo $navigation;
?>
This will output
<ul>
<li class="first">Coffee</li>
<li>Tea</li>
<li>Milk</li>
<li>Beer</li>
<li class="last">Water</li>
</ul>
This assumes no line feeds inside the opening <ul...> tag. If there are any, use the s modifier on the first expression too.
The magic happens in (.(?<!<li))*?. This will match any character (the dot) that is not the beginning of the string <li, repeated any amount of times (the *) in a non-greedy fashion (the ?).
Of course, the whole thing would have to be expanded if there is a chance the list items already have the class attribute set. Also, if there is only one list item, it will match twice, giving it two such attributes. At least for xhtml, this would break validation.
You could load the navigation in a SimpleXML object and work with that. This prevents you from breaking your markup with some crazy regex :)
As a preface .. this is waaay over-complicating things in most use-cases. Please see other answers for more sanity :)
Here is a little PHP class I wrote to solve a similar problem. It adds 'first', 'last' and any other classes you want. It will handle li's with no "class" attribute as well as those that already have some class(es).
<?php
/**
* Modify list items in pre-rendered html.
*
* Usage Example:
* $replaced_text = ListAlter::addClasses($original_html, array('cool', 'awsome'));
*/
class ListAlter {
private $classes = array();
private $classes_found = FALSE;
private $count = 0;
private $total = 0;
// No public instances.
private function __construct() {}
/**
* Adds 'first', 'last', and any extra classes you want.
*/
static function addClasses($html, $extra_classes = array()) {
$instance = new self();
$instance->classes = $extra_classes;
$total = preg_match_all('~<li([^>]*?)>~', $html, $matches);
$instance->total = $total ? $total : 0;
return preg_replace_callback('~<li([^>]*?)>~', array($instance, 'processListItem'), $html);
}
private function processListItem($matches) {
$this->count++;
$this->classes_found = FALSE;
$processed = preg_replace_callback('~(\w+)="(.*?)"~', array($this, 'appendClasses'), $matches[0]);
if (!$this->classes_found) {
$classes = $this->classes;
if ($this->count == 1) {
$classes[] = 'first';
}
if ($this->count == $this->total) {
$classes[] = 'last';
}
if (!empty($classes)) {
$processed = rtrim($matches[0], '>') . ' class="' . implode(' ', $classes) . '">';
}
}
return $processed;
}
private function appendClasses($matches) {
array_shift($matches);
list($name, $value) = $matches;
if ($name == 'class') {
$value = array_filter(explode(' ', $value));
$value = array_merge($value, $this->classes);
if ($this->count == 1) {
$value[] = 'first';
}
if ($this->count == $this->total) {
$value[] = 'last';
}
$value = implode(' ', $value);
$this->classes_found = TRUE;
}
return sprintf('%s="%s"', $name, $value);
}
}