How to make this linking making script behave with markdown? - php

I'm using PHP markdown but I also need a script to convert plaintext links into clicakable ones. Both work independently, but when I try to run them together, if I run markdown first, the makelinks still processes on the html code and screws things up.. and.. vice versa. Any idea of how to stop it from doing that? I can't figure out regex to ignore the markdown style links
function makeLinks($text) {
$text = preg_replace('%(((f|ht){1}tp://)[-a-zA-^Z0-9#:\%_\+.~#?&//=]+)%i',
'\\1', $text);
$text = preg_replace('%([[:space:]()[{}])(www.[-a-zA-Z0-9#:\%_\+.~#?&//=]+)%i',
'\\1\\2', $text);
return $text;
}
sample text:
###[Title Section](http://domain/folder/page.html)
- Blah blah some text and then a link: www.webpage.org.

The double-linkify problem can be solved best with guesswork and workarounds. (We have some duplicate questions, but I can never find a good one..)
Since already converted http://-urls only occur right after href=" or an >, you can use those for negative assertions.
(?<!href="|>)
Should be written at the start of your first regex:
$text = preg_replace('%(?<!href="|>)(((f|ht){1}tp://)...
Your second regex uses the :space: as anchor, so should be fault tolerant already.

Related

preg_match (I can't do regex)

The company I work for have asked me to give them the ability to place a modal box on the web page from the CMS, but do not want to type HTML. As I cannot for the life of me understand regex I can't get it.
The layout of the code they should type is this:
++modal++
Some paragraph text.
Another paragraph.
++endmodal++
The paragraphs are already converted by markdown into <p>paragraph</p>.
So really the match has to be ++modal++ any number of A-Za-z0-9any symbol excluding + ++endmodal++ then replaced with HTML.
I'm not sure it preg_match or preg_replace should be used.
I got this far:
$string = '++modal++<p>Hello</p>++endmodal++';
$pattern = '/\+\+modal\+\+/';
preg_match($pattern, $string, $matches);
Thank you in advance.
EDIT: A to be a bit more clear, I wish to replace the ++modal++ and ++endmodal++ with HTML and leave the middle bit as is.
I don't really think you need a RegEx here as your delimiters remain always the same and always on the same position of the string. Regular expressions are also expensive on resources and as a third counter argument you said you're not fit with them.
So why not use a simple replacement or string trimming if it comes to that.
$search = array('++modal++', '++endmodal++');
$replacement = array('<tag>', '</tag>');
$str = '++modal++<p>Hello</p>++endmodal++';
$result = str_replace($search, $replacement, $str);
Where, of course, '<tag>' and '</tag>' are just example placeholders for your replacement.
This is what the manual for str_replace() says:
If you don't need fancy replacing rules (like regular expressions),
you should always use this function instead of preg_replace().
I think you should get your desired content using:
preg_match('/\+\+modal\+\+([^\+]+)\+\+endmodal\+\+/', $string, $matches)
$matches[1] = '<p>Hello</p>
You're trying to re-invent the wheel here. You're trying to write a simple template system here, but there are dozens of templating tools for PHP that you could use, ranging from big and complex like Smarty and Twig to really simple ones that aren't much more than you're trying to write.
I haven't used them all, so rather than recommend one I'll point you to a list of template engines you could try. You'll probably find more with a quick bit of googling.
If you do insist on writing your own, it's important to consider security. If you're outputting anything that contains data entered by your users, you must make sure all your output is properly escaped and sanitised for display on a web page; there a numerous common hacks that can take advantage of an insecure templating system to completely compromise a site.
<?php
$string = '++modal++<p>Hello</p>++endmodal++';
$patterns = array();
$patterns[0] = "/\+\+modal\+\+/"; // put '\' just before +
$patterns[1] = "/\+\+endmodal\+\+/";
$replacements = array();
$replacements[1] = '<html>';
$replacements[0] = '</html>';
echo preg_replace($patterns, $replacements, $string);
?>
Very similar to this example

Why does PHP strip_tags("is<isvery interesting <thatthis willbe stripped") return is?

I am about to make a char counting function which counts input from a tinyMce textarea.
Server-side validation with code like this:
$string = "is<isvery interesting <thatthis willbe stripped";
$stripped = strip_tags($string);
$count = strlen($stripped); // This will return 2
You might notice that $string has no tag at all, anyway strip_tags() strips everything from the first less-than sign on.
Is this a bug or a feature?
This has been documented:
Because strip_tags() does not actually validate the HTML, partial or
broken tags can result in the removal of more text/data than expected.
http://php.net/manual/en/function.strip-tags.php
strip_tags is actually quite dumb. It strips everything, that only remotely looks like an HTML tag. That is, starting with < and some alpha-numeric sign until the closing > or as far as it can get.
The observed behavior is in this context a bug. However, strip_tags is then not the tool to do error correction on input HTML. Its purpose is to strip away stuff, so that the remainder is safe to embed in websites. In doubt, it strips more, which is a good thing.

Link converting pregmatch working with markdown

function makeLinks($text) {
$text = preg_replace('%(?<!href=")(((f|ht){1}(tp://|tps://))[-a-zA-^Z0-9#:\%_\+.~#?&//=]+)%i',
'\\1', $text);
$text = preg_replace('%([:space:]()[{}])(www.[-a-zA-Z0-9#:\%_\+.~#?&//=]+)%i',
'\\1\\2', $text);
return $text;
}
It misses if I have something like this: - www.website.org (a hyphen then a space) at the beginning of a line. If I have - www.website.org - www.website.org it catches the second one.
Shouldn't that be covered by the space in the second preg_replace?
I also tried %(\s\n\r(){})
I am running it through markdown, but not till after (markdown(makeLinks($foo))) so I thought that shouldn't interfere, but when I take the markdown off and everything just echos out in one line, it does make links out of them. If i put makeLinks(markdown($foo)) it behaves the same as initially.. not making links out of the ones that begin with www at the beginning of list items.
Thats some pretty dodgy regex work there. Here is a regex I would recommend instead for URL dectection:
%(?<!href="?)(((f|ht)(tp://|tps://))?[a-zA-Z0-9-].[-a-zA-^Z0-9#:\%_\+.~#?&//=]+)%i
Should be a lot more reliable than the two you have now.

PHP - Strip Tags allow "<3" text hearts

Hey Guys, I usually come to StackOverflow for my php coding answers, although this time I could not seem to find one.
I am creating a social networking type website that requires lots of text entering and data checking. Everything seems good and the other day I was testing some characters users may enter and its seems that if I try to insert a '<3' it says its empty.
<?php
$text = "<3";
$clean = strip_tags($text);
echo $clean; // prints out nothing..
?>
Use htmlspecialchars instead strip_tags.
"<3" will be break your HTML when you output it directly.
May you replace it with an (Heart)Image or other HTML- Tag( with css heart property) , like you would do with Smileys.

preg_replace() help in PHP

Consider this string
hello awesome <a href="" rel="external" title="so awesome is cool"> stuff stuff
What regex could I use to match any occurence of awesome which doesn't appear within the title attribute of the anchor?
So far, this is what I've came up with (it doesn't work sadly)
/[^."]*(awesome)[^."]*/i
Edit
I took Alan M's advice and used a regex to capture every word and send it to a callback. Thanks Alan M for your advice. Here is my final code.
$plantDetails = end($this->_model->getPlantById($plantId));
$botany = new Botany_Model();
$this->_botanyWords = $botany->getArray();
foreach($plantDetails as $key=>$detail) {
$detail = preg_replace_callback('/\b[a-z]+\b/iU', array($this, '_processBotanyWords'), $detail);
$plantDetails[$key] = $detail;
}
And the _processBotanyWords()...
private function _processBotanyWords($match) {
$botanyWords = $this->_botanyWords;
$word = $match[0];
if (array_key_exists($word, $botanyWords)) {
return '' . $word . '';
} else {
return $word;
}
}
Hope this well help someone else some day! Thanks again for all your answers.
This subject comes up pretty much every day here and basically the issue is this: you shouldn't be using regular expressions to parse or alter HTML (or XML). That's what HTML/XML parsers are for. The above problem is just one of the issues you'll face. You may get something that mostly works but there'll still be corner cases where it doesn't.
Just use an HTML parser.
Asssuming this is related to the question you posted and deleted a little while ago (that was you, wasn't it?), it's your fundamental approach that's wrong. You said you were generating these HTML links yourself by replacing words from a list of keywords. The trouble is that keywords farther down the list sometimes appear in the generated title attributes and get replaced by mistake--and now you're trying to fix the mistakes.
The underlying problem is that you're replacing each keyword using a separate call to preg_replace, effectively processing the entire text over and over again. What you should do is process the text once, matching every single word and looking it up in your list of keywords; if it's on the list, replace it. I'm not set up to write/test PHP code, but you probably want to use preg_replace_callback:
$text = preg_replace_callback('/\b[A-Za-z]+\b/', "the_callback", $text);
"the_callback" is the name of a function that looks up the word and, if it's in the list, generates the appropriate link; otherwise it returns the matched word. It may sound inefficient, processing every word like this, but in fact it's a great deal more efficient than your original approach.
Sure, using a parsing library is the industrial-strength solution, but we all have times were we just want to write something in 10 seconds and be done. Next time you want to process the meaty text of a page, ignoring tags, try just run your input through strip_tags first. This way you will get only the plain, visible text and your regex powers will again reign supreme.
This is so horrible I hesitate to post it, but if you want a quick hack, reverse the problem--instead of finding the stuff that isn't X, find the stuff that IS, change it, do the thing and change it back.
This is assuming you're trying to change awesome (to "wonderful"). If you're doing something else, adjust accordingly.
$string = 'Awesome is the man who <b>awesome</b> does and awesome is.';
$string = preg_replace('#(title\s*=\s*\"[^"]*?)awesome#is', "$1PIGDOG", $string);
$string = preg_replace('#awesome#is', 'wonderful', $string);
$string = preg_replace('#pigdog#is', 'awesome', $string);
Don't vote me down. I know it's hack.

Categories