Php title regular expression - php

I'm trying to replace a title tag from |title|Page title| to <title>Page Title</title>, using this regular expression. But being a complete amateur, it's not gone to well..
'^|title|^[a-zA-Z0-9_]{1,}|$' => '<title>$1</title>'
I would love to know how to fix it, and more importantly, what I did wrong and why it was wrong.

You almost got it:
You should escape the | characters as they have special meaning in a
regex and you are using it as a plain character.
You should add the space character to your search group
$string = '|title|Page title|';
$pattern = '/\|title\|([a-zA-Z0-9_ ]{1,})\|/';
$replacement = '<title>$1</title>';
echo preg_replace($pattern, $replacement, $string); //echoes <title>Page title</title>
See working demo
OP posted some code in comments which is wrong, try this version:
$regular_expressions = array( array( '/\|title\|([a-zA-Z0-9_ ]{1,})\|/' , '<title>$1</title>' ));
foreach($regular_expressions as $regexp){
$data = preg_replace($regexp[0], $regexp[1], $data);
}

Heres a little function I came up with a while back to essentially scrape the titles of a page when users submitted links through my service. What this function does is will get the contents of a provided URL. Seek a title tag, if found, get whats between the title tag and dump it's result. With a little tweaking I am sure you can use a replace method for whatever your doing, and make it work for your needs. So this is more of a starting point rather than an answer but overall I hope it helps to some extent.
$url = 'http://www.chrishacia.com';
function get_page_title($url){
if( !($data = file_get_contents($url)) ) return false;
if( preg_match("#<title>(.+)<\/title>#iU", $data, $t)) {
return trim($t[1]);
} else {
return false;
}
}
var_dump(get_page_title($url));

<?php
$s = "|title|Page title|";
$s = preg_replace('/^\|title\|([^\|]+)\|/', "<title>$1</title>", $s);
echo $s;
?>

Related

PHP Regular Expression - Single Quote not working - TWIG pre-escaping

I got a problem with single quotes in a regular expression.
What i want to do is replace smileys in a string to a html image tag.
All smileys are working, except the sad smiley :'-( because it has a single quote in it.
Magic Quotes is turned off (testet with if (g!et_magic_quotes_gpc()) dd('mq off');).
So, let me show you some code.
protected $emoticons = array(
// ...
'cry' => array(
'image' => '<img class="smiley" src="/image/emoticon/cry.gif" />',
'emoticons' => array(":'(", ";'(", ":'-(", ";'-(")
),
);
My method to replace all the emoticons is the following:
public function replaceEmoticons($input) {
$output = $input;
foreach ($this->emoticons as $emo_group_name => $emo_group) {
$regex_emo_part = array();
foreach ($emo_group['emoticons'] as $emoticon) {
$regex_emo_part[] = preg_quote($emoticon, '#');
}
$regex_emo_part = implode('|', $regex_emo_part);
$regex = '#(?!<\w)(' . $regex_emo_part .')(?!\w)#';
$output = preg_replace($regex, $emo_group['image'], $output);
}
return $output;
}
But as i said: ' kills it. No replacement there. :-) :-/ and so on are working. Why?
FYI Content of $regex: #(?!<\w)(\:\'\(|;\'\(|\:\'\-\(|;\'\-\()(?!\w)#
What is wrong here, can you help me?
UPDATE:
Thanks # cheery and cychoi. The replacing method is okay, you've got right.
I found the problem. My string gets escaped before it is forwarded to the replaceEmoticons method. I use TWIG templating engine and i use |nl2br filter before my selfmade replace_emoticon filter.
Let me show you. This is the output in the final template. It is a template to show a comment for an blog entry:
{{ comment.content|nl2br|replace_emoticons|raw }}
Problem: nl2br is auto pre-escaping the input string, so ' gets replaced by the escaped one '
I need this nl2br to show linebreakes as <br /> - and i need the escaping too, to disallow html tags in the user's input.
I need replace_emoticons to replace my emoticons (selfmade TWIG extension).
And i need raw here at the end of the filter chain too, otherwise all HTML smiley img tags gets escaped and i will see raw html in the comment's text.
What can i do here? The only problem here seems to be that nl2br escapes ' too. This is no bad idea but in my case it will destroy all sad smileyss containing ' in it.
Still searching for a solution to solve this and i hope you can help me.
Best,
titan
I added an optional parameter to the emoticon method:
public function replaceEmoticons($input, $use_emo_encoding_for_regex = true) {
and i changed the foreach part a lil' bit:
foreach ($emo_group['emoticons'] as $emoticon) {
if ($use_emo_encoding_for_regex === true) {
$emoticon = htmlspecialchars($emoticon, ENT_QUOTES);
}
$regex_emo_part[] = preg_quote($emoticon, '#');
}
It works! All emoticons are replaced!

PHP string replace for single URL

I've been reading various articles and have arrived at some code.
For a single URL on my site http://home.com/example/ (and only that URL - no children) I would like to replace all instances of "<a itemprop="url" with just <a basically stripping out itemprop="url" This is what I have come up with but I'm not sure whether I'm on the right lines and if I am how to 'echo' it on on the basis it's code and not something to be echoed to screen. Also not too sure whether I need to escape the double quotes within the single quotes in $str_replace.
if(preg_match("%/example/$%", $_SERVER['REQUEST_URI'])){
$string = "<a itemprop=\"url\"";
$str_replace = str_replace('<a itemprop="url"','<a',$string);
//something here
}
Please could anyone advise also if I am correct in how I am approaching this what the final part of the code needs to be to run it (I'm assuming not echo $str_replace;. I'll be running it as a function from my Wordpress functions.php file - I'm comfortable with that if it works.
This could be a mess and I apologise if it is.
try strpos()
if(strpos($_SERVER['REQUEST_URI'], "example") !== false){
$string = "<a itemprop=\"url\"";
$str_replace = str_replace('<a itemprop="url"','<a',$string);
}
There must be some kind of template where you get the default html and modify it with the php at some point of your code...
$html_template = file('...adress_of_the_url_template...');
.......
if(strpos($_SERVER['REQUEST_URI'], "example") !== false){
$string = "<a itemprop=\"url\"";
$html_template = str_replace($string,'<a',$html_template);
}
.......
.......
echo $html_template
Then you have replaced the html code as you wanted
It looks like I was over-complicating it because the solution appears to be within Wordpress functions. This is what I've ended up with. Any comments, corrections or recommendations appreciated. I'm not a coder as you may realise...
function schema( $content ) {
if (is_page( 'my-page-slug')) {
return str_replace('<a itemprop="url"', '<a', $content);
}
else return $content;
}
add_filter('the_content', 'schema', 99);

How to use htmlspecialchars only on <code></code> tags.

I'm using WordPress and would like to create a function that applies the PHP function htmlspecialchars only to code contained between <code></code> tags. I appreciate this may be fairly simple but I'm new to PHP and can't find any references on how to do this.
So far I have the following:
function FilterCodeOnSave( $content, $post_id ) {
return htmlspecialchars($content, ENT_NOQUOTES);
}
Obviously the above is very simple and performs htmlspecialchars on the entire content of my page. I need to limit the function to only apply to the HTML between code tags (there may be multiple code tags on each page).
Any help would be appreciated.
Thanks,
James
EDIT: updated to avoid multiple CODE tags
Try this:
<?php
// test data
$textToScan = "Hi <code>test12</code><br>
Line 2 <code><br>
Test <b>Bold</b><br></code><br>
Test
";
// debug:
echo $textToScan . "<hr>";
// the regex pattern (case insensitive & multiline
$search = "~<code>(.*?)</code>~is";
// first look for all CODE tags and their content
preg_match_all($search, $textToScan, $matches);
//print_r($matches);
// now replace all the CODE tags and their content with a htmlspecialchars() content
foreach($matches[1] as $match){
$replace = htmlspecialchars($match);
// now replace the previously found CODE block
$textToScan = str_replace($match, $replace, $textToScan);
}
// output result
echo $textToScan;
?>
Use DOMDocument to get all <code> tags;
// in this example $dom is an instance of DOMDocument
$code_tags = $dom->getElementsByTagName('code');
if ($code_tags) {
foreach ($code_tags as $node) {
// [...]
}
// [...]
}
i know this is a little bit late, but you can call the htmlspecialchars function first and then when outputting call the htmlspecialchars_decode function

Change a relative URL to absolute URL

for example i've got a string like this:
$html = '
test
test
test
hi
';
and i want to append the absolute url to all hrefs where no abolute domain is given.
$html = '
test
test
test
hi
';
whats the best way to do that? i guess something with RegEx, but my RegEx skills are ** ;)
thanks in advance!
found a good way :
$html = preg_replace("#(<\s*a\s+[^>]*href\s*=\s*[\"'])(?!http)([^\"'>]+)([\"'>]+)#", '$1http://mydomain.com/$2$3', $html);
you can use (?!http|mailto) if you have also mailto links in your $html
$domain = 'http://mydomain';
preg_match_all('/href\="(.*?)"/im', $html, $matches);
foreach($matches[1] as $n=>$link) {
if(substr($link, 0, 4) != 'http')
$html = str_replace($matches[1][$n], $domain . $matches[1][$n], $html);
}
The previous answer will cause problems with your first and fourth example because it fails to include a forward slash to separate the page from the page name. Admittedly this can be fixed by simply appending it to the $domain, but if you do that then href="/something.php" will end up with two.
Just to give an alternative Regex solution you could go with something like this...
$pattern = '#'#(?<=href=")(.+?)(?=")#'';
$output = preg_replace_callback($pattern, 'make_absolute', $input);
function make_absolute($link) {
$domain = 'http://domain.com';
if(strpos($link[1], 'http')!==0) {
if(strpos($link[1], '/')!==0) {
return $domain.'/'.$link[1];
} else {
return $domain.$link[1];
}
}
return $link[1];
}
However it is worth noting that with a link such as href="example.html" the link is relative to the current directory neither method shown so far will work correctly for relative links that aren't in the root directory. In order to provide a solution that is though more information would be required about where the information came from.

basic if/then with PHP

Okay so i set up this thing so that I can print out page that people came from, and then put dummy tags on certain pages. Some pages have commented out "linkto" tags with text in between them.
My problem is that some of my pages don't have "linkto" text. When I link to this page from there I want it to grab everything between "title" and "/title". How can I change the eregi so that if it turns up empty, it should then grab the title?
Here is what I have so far, I know I just need some kind of if/then but I'm a rank beginner. Thank you in advance for any help:
<?php
$filesource = $_SERVER['HTTP_REFERER'];
$a = fopen($filesource,"r"); //fopen("html_file.html","r");
$string = fread($a,1024);
?>
<?php
if (eregi("<linkto>(.*)</linkto>", $string, $out)) {
$outdata = $out[1];
}
//echo $outdata;
$outdatapart = explode( " " , $outdata);
echo $part[0];
?>
Here you go: if eregi() fails to match, the $outdata assignment will never happen as the if block will not be executed. If it matches, but there's nothing between the tags, $outdata will be assigned an empty string. In both cases, !$outdata will be true, so we can fallback to a second match on the title tag instead.
if(eregi("<linkto>(.*?)</linkto>", $string, $link_match)) {
$outdata = $link_match[1];
}
if(!$outdata && eregi("<title>(.*?)</title>", $string, $title_match)) {
$outdata = $title_match[1];
}
I also changed the (.*) in the match to (.*?). This means, don't be greedy. In the (.*) form, if you had $string set to
<title>Page Title</title> ...
... <iframe><title>A second title tag!</title></iframe>
The regex would match
Page Title</title> ... ... <iframe><title>A second title tag!
Because it tries to match as much as possible, as long as the text is between any and any other !. In the (.*?) form, the match does what you'd expect - it matches
Page Title
And stops as soon as it is able.
...
As an aside, this thing is an interesting scheme, but why do you need it? Pages can link to other pages and pass parameters via the query string:
...
Then somescript.php can access the prevpage parameter via the $_GET['prevpage'] superglobal variable.
Would that solve your problem?
The POSIX regex extension (ereg etc.) will be deprecated as of PHP 5.3.0 and may be gone completely come PHP 6, you're better off using the PCRE functions (preg_match and friends).
The PCRE functions are also faster, binary safe and support more features like non-greedy matching etc.
Just a pointer.
you need if, else.
if(eregi(...))
{
.
.
.
}
else
{
just grab title;
}
perhaps you should have done a quick google search to find this very simple answer.
Just add another if test before you assign the match to $outdata:
if (eregi("<linkto>(.*)</linkto>", $string, $out)) {
if ($out[1] != "") {
$outdata = $out[1];
} else {
// Look in the title.
}
}

Categories