I have a blog entry that will sometimes contain a lot of text/images, and I want to cut an excerpt from that blog. To be more specific I want to match everything until after the second image tag
below is some sample text.
I've tried a negative lookaheads like
/[\w\r\n;:',."&\s*<>=-_]+(?!<img)/i
but I can't figure out a way to have the lookahead apply to a '+' modifier. Anyone got any clue, I'd be real grateful.
*override*
I've been stuck in a room lately, and though it's hard to stay creative all the time, sometimes you need that extra kick. Well for some us we have to throw pictures of true creative genius at ourselves to stimulate us.
So sit back and soak in some inspiration I've come across the past year.
<figure>
<a href="">
<img class="aligncenter" src="http://funnypagenet.com/wp-content/uploads/2011/07/Talesandminimalism_12_www.funnypagenet.com_.jpg" alt="" width="574" height="838" />
</a>
<figcaption></figcaption>
</figure>
<h4 style="text-align: center;">
source
</h4>
Couldn't find who did this, but couldn't explain the movie any simpler
<figure>
<img class="aligncenter" src="http://brickhut.files.wordpress.com/2011/05/theempirestrikesback1.jpg" alt="" width="540" height="800" />
<figcaption></figcaption>
</figure>
Obvious a straight forward string cutting is not suitable for your second image:
...
<figure>
<img class="aligncenter" src="http://brickhut.files.wordpress.com/2011/05/theempirestrikesback1.jpg" alt="" width="540" height="800" />
<figcaption></figcaption>
</figure>
Cutting after the image would leave unclosed elements:
...
<figure>
<img class="aligncenter" src="http://brickhut.files.wordpress.com/2011/05/theempirestrikesback1.jpg" alt="" width="540" height="800" />
Which could destroy the rendering of the page inside the browser. And it does not play a role if you use preg_match with a regular expression here or some string functions.
What you need is a DOM parser like DOMDocument that is able to process the HTML:
Given some sample HTML code that is similar to yours in question:
$html = <<<HTML
dolor sit amet, consectetuer adipiscing elit. <img src="http://example.com/img-a.jpg"> Aenean commodo
ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes,
nascetur ridiculus mus.
<figure>
<img src="http://example.com/img-b.jpg">
<figcaption>Figure Caption</figcaption>
</figure>
Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat massa quis enim. Donec pede justo, fringilla vel, aliquet nec, vulputate eget, arcu. In enim justo, rhoncus ut.
HTML;
You can now use the DOMDocument class to load the HTML chunk inside a <body> tag - because it's your whole html body for the manipulation. As you use non-standard HTML tags (<figure> & <figcaption>) you should disable warnings about those when loading the string with libxml_use_internal_errors:
$doc = new DOMDocument();
libxml_use_internal_errors(1);
$doc->loadHTML(sprintf('<body>%s</body>', $html));
This is the basic setup of the DOM parser, your HTML is now inside the parser. Now comes the interesting part. You want to create the excerpt until the second image of the document. That means, everything after that element should be removed. Sounds as easy as like cutting a string which we know does not work, but this time the DOM parser does all the work for us.
You only need to obtain all nodes (<tag>, Text, <!-- comments -->, ...) and delete them. All nodes after the second <img> tag in (following document order). Such things can be expressed with XPath:
/descendant::img[position()=2]/following::node()
PHP's DOM parser comes with XPath, so let's do it:
$xp = new DOMXPath($doc);
$delete = $xp->query('/descendant::img[position()=2]/following::node()');
foreach ($delete as $node)
{
$node->parentNode->removeChild($node);
}
The only thing left is to obtain (exemplary output) the excerpt that is left over. As we know it's all inside the <body> tag:
foreach ($doc->getElementsByTagName('body')->item(0)->childNodes as $child)
{
echo $doc->saveHTML($child);
}
Which will give you the following:
dolor sit amet, consectetuer adipiscing elit. <img src="http://example.com/img-a.jpg"> Aenean commodo
ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes,
nascetur ridiculus mus.
<figure><img src="http://example.com/img-b.jpg"></figure>
As this example shows, the <figure> tag is properly closed now.
A similar scenario is to create an excerpt after a specific text-length or word-count: Wordwrap / Cut Text in HTML string
Well, it's not regex, but it should work:
$post = str_ireplace('<img', '!!!<img', $post);
list($p1, $p2) = explode('!!!', $post);
$keep = $p1 . $p2;
Puts a split marker before the image tags (!!!), splits on them and keeps the first two chunks, which should be everything up to the second image tag. No regex required.
Edit: Because this is for a excerpt, you might want to run strip_tags() on the result. It's possible that if you don't, you'll have some opened HTML tags that never get closed.
If you really want regex based solution then here it is:
// assuming $str is your full HTML text
if ( preg_match_all('~^(.*?<img\s.*?<img\s[^>]*>)~si', $str, $m) )
print_r ( $m[1] );
Related
Hey i want to display some html/css depending on how many rows there are in database basically. Is there a way to do this without echo? Because i'm lost when i have to use many ' '. Here is code sample
<?php foreach ($result as $row) {
}?>
<div id="abox">
<div class="abox-top">
Order x
</div>
<div class="abox-panel">
<p>lorem ipsum</p>
</div>
<br>
<div class="abox-top">
lorem</div>
<div class="abox-panel">
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut ac convallis diam, vitae rhoncus enim. Proin eu turpis at ligula posuere condimentum nec eu massa. Donec porta tellus ante, non semper risus sagittis at. Pellentesque sollicitudin sodales fringilla. Ut efficitur urna eget arcu luctus lobortis. Proin ut tellus non lacus dapibus vehicula non sit amet ante. Ut nibh justo, posuere sit amet fringilla eget, aliquam mattis urna.</p>
</div>
There's nothing complicated about it:
Simple/ugly:
<?php while($row = fetch()) { ?>
<div>
<?php echo $row['somefield'] ?>
</div>
<? } ?>
Alternative:
<?php
while ($row = fetch()) {
echo <<<EOL
<div>
{$row['somefield']}
</div>
EOL;
}
and then of course there's any number of templating systems, which claim to separate logic from display, and then litter the display with their OWN logic system anyways.
you can simply use <?= short opening tag introduced in php 5.3 before PHP 5.4.0 came out you had to enable short_open_tag ini but after 5.4.0 tag
here is an example
<?php $var='hello, world'; ?>
<?=$var ?> // outputs world
hope it helps.
Templates engines makes your life a pie
Take Smarty for example, it's pretty good template library. What template engine does is fetch variables to pre defined templates.
Your code in simple php:
<?php
echo 'My name is '. $name. ', that's why I'm awesome <br>';
foreach ($data as $value) {
echo $value['name'].' is awesome to!';
}
?>
Code in smarty:
My name is {$name}, that's why I'm awesome <br>
{foreach $data as $value}
{$value} is awesome to!
{/foreach}
Template engines pros:
Templates are held in separate custom named files. (i.e users.tpl, registration.tpl etc)
Smarty Caches your views (templates).
Simple to use i.e {$views + ($viewsToday/$ratio)}.
A lot of helpers.
You can create custom plugins/functions.
Easy to use and debug.
Most importantly: It separates your php code from html!
Template engines cons:
Sometimes hard to grip the concept of working for beginner.
Don't know any more actually
When I dont want to use a template engine (I like Twig, btw), I do something like this:
1) Write a separate file with the html code and some custom tags where data should be presented:
file "row_template.html":
<div class="abox-top">{{ TOP }}</div>
<div class="abox-panel"><p>{{ PANEL }}</p></div>
2) And then, read that file and do the replacements in the loop:
$row_template = file_get_contents('row_template.html');
foreach ($result as $row) {
$replaces = array(
'{{ TOP }}' => $row['top'],
'{{ PANEL }}' => $row['panel']
);
print str_replace(
array_keys($replaces),
array_values($replaces),
$row_template
);
}
In addition, you can change the content of "row_template.html" without touching the php code.
Clean and nice to the eye!
I am programming on wordpress and I want to edit a php file. I want the text to be displayed with line breaks and not all in one line.
Here is my code(I want jonh in one line and travolta in another but it gets displayed in one):
<div class="slide">
<img class="animated fade_left" src='<?php echo esc_url(onepage_get_option('onepage_testimonial_2_image', ONEPAGE_DIR_URI . "assets/images/team2.jpg")); ?>' onmouseover="javascript: this.title = '';" title="">
<div class="bx-caption animated fade_right"><span><a class="arrow"></a><?php echo esc_attr(onepage_get_option('onepage_testimonial_2_content', __('Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam.','one-page'))); ?><a class="testimonial"><?php echo esc_attr(onepage_get_option('onepage_testimonial_2_name', __('john \n travolta','one-page'))); ?></a></span></div>
Any suggestions?
If you want the link / element with the class testimonial on a new line, I would use css as that keeps it flexible and makes it easy to change if you want to do it differently in the future.
So in your css file:
.testimonial {
display: block;
}
In general, I would try to keep presentational stuff out of the php code.
I create a simple backed area for my client to post new job openings and was wanting to give them the ability to format the text a little by adding line breaks in the job description text that will be visible on the front end.
The job openings are stored in a MySQL database.
Example of what I'm talking about:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla quis quam sollicitudin, bibendum enim a, vulputate turpis.
Nullam urna purus, varius eget purus quis, facilisis lacinia nibh. Ut in blandit erat.
I've would like the breaks to happen when my client hits enter / return on the keyboard.
Any help on this matter would be appreciated.
------------UPDATE------------
okay so after much trial and error I got it somewhat working.
I added this line in my upload / action code.
$description = nl2br(htmlspecialchars($_POST['description']));
Full upload code is:
<?php
include($_SERVER['DOCUMENT_ROOT'] . "/connections/dbconnect.php");
$date = mysql_real_escape_string($_POST["date"]);
$title = mysql_real_escape_string($_POST["title"]);
$description = mysql_real_escape_string($_POST["description"]);
$description = nl2br(htmlspecialchars($_POST['description']));
// Insert record into database by executing the following query:
$sql="INSERT INTO hire (title, description, date) "."VALUES('$title','$description','$date')";
$retval = mysql_query($sql);
echo "The position was added to employment page.<br />
<a href='employment.php'>Post another position.</a><br />";
?>
Then on my form I added this to the textarea, but I get an error.
FYI that is line 80 the error is refering to.
Position Details:<br />
<textarea name="description" rows="8"><?php echo str_replace("<br />","",$description); ?></textarea>
</div>
Here is what the error looks like.
Here is my results page code:
<?php
$images = mysql_query("SELECT * FROM hire ORDER BY ID DESC LIMIT 10");
while ($image=mysql_fetch_array($images))
{
?>
<li data-id="id-<?=$image["id"] ?>">
<div class="box white-bg">
<h2 class="red3-tx"><?=$image["title"] ?> <span class="date-posted blue2-tx"><?=$image["date"] ?></span></h2>
<div class="dotline"></div>
<article class="blue3-tx"><?=$image["description"] ?><br />
<br />
For more information please call ###-###-####.</article>
</div>
</li>
<?php
}
?>
If I delete all that error copy and write out a real position with line breaks it works.
I have no idea how to fix the error though.
Again any help would be appreciated.
Thanks!
you can use str_replace
$statement = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla quis quam sollicitudin, bibendum enim a, vulputate turpis.
Nullam urna purus, varius eget purus quis, facilisis lacinia nibh. Ut in blandit erat."
$statement = str_replace(chr(13),"<br/>", $statement);
query example : INSERT INTO table (statement) VALUES ('$statement');
hope this can help you
EDIT :
if you want display the result at textarea from database u can using this code
$des = $row['description'] //my assumption that your feild name at table inside mySQL is description
Position Details:<br />
<textarea name="description" rows="8"><?php echo str_replace("<br />",chr(13),$des); ?></textarea>
</div>
hope this edit can help your second problem
I would start by answering a couple of questions first
Do I want my database to store html-formatted user input?
Is the data going to be editable afterwards?
Since you seem to want only nl2br, a simple approach would be to save the content as is in the database, then use nl2br() on the output side as Marcin Orlowski suggested.
I want to remove image from specific URL only from html
for example:
http://pastebin.com/Qaw4dRbT
<p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.<img src="http://www.another-domain.tld/r/151230695794/32310/s/25e829c1/removeit.img" alt="" width="1" height="1" border="0" /></p>
i want to remove image from another-domain.tld
and keep another image.
Thanks
Seek it out using xpath and remove it from its parent:
// Build a new DOMDocument, load it up with your HTML
$doc = new DOMDocument();
$doc->loadHTML($html);
// Reference to our DIV container
$container = $doc->getElementsByTagName("div")->item(0);
// New instance of XPath class based on $doc
$xpath = new DOMXPath($doc);
// Get images that contain 'specific-domain.tld' in their src attribute
$images = $xpath->query("//img[contains(#src,'specific-domain.tld')]");
// For every image found
foreach ($images as $image) {
// Remove that image from its parent
$image->parentNode->removeChild($image);
}
// Output the resulting HTML of our container
echo $doc->saveHTML($container);
Executable Demo: http://sandbox.onlinephpfunctions.com/code...6529d025e135013184e
I am working with html document generated from Micrsoft Word 2007/2010. Besides generating incredibly dirty html, word also has the tendency of using both block and inline style. I am looking for a php library would merge block into already existing inline style element.
Edit
The goal is to construct a html block preserve the original formatting and editable in WYSIWYG editor like tinyMCE
Example
If the original html is:
<html>
<head>
<style>
.normaltext {color:black;font-weight:normal;font-size:10pt}
.important {color:red;font-weight:bold;font-size:11pt}
</style>
<body>
<p class="normaltext" style="font-family:arial">
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
In ut erat id dui mollis faucibus. Mauris eu neque et eros tempus placerat.
<span class="important">Nam in purus nisi</span>, vitae dictum ligula.
Morbi mattis eros eget diam vulputate imperdiet.
<span class="important" style="color:green">Integer</span> a metus eros.
Sed iaculis porta imperdiet.
</p>
</body>
</html>
Should become:
<html>
<head>
<body>
<p style="font-family:arial;color:black;font-weight:normal;font-size:10pt">
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
In ut erat id dui mollis faucibus. Mauris eu neque et eros tempus placerat.
<span style="color:red;font-weight:bold;font-size:11pt">Nam in purus nisi</span>, vitae dictum ligula.
Morbi mattis eros eget diam vulputate imperdiet.
<span style="color:green;font-weight:bold;font-size:11pt">Integer</span> a metus eros.
Sed iaculis porta imperdiet.
</p>
</body>
</html>
Check out:
http://inlinestyler.torchboxapps.com/
http://premailer.dialect.ca/
http://www.pelagodesign.com/sidecar/emogrifier/
http://blog.verkoyen.eu/blog/p/detail/convert-css-to-inline-styles-with-php
http://beaker.mailchimp.com/inline-css
http://burrowscode.wordpress.com/2011/02/19/emailify-internal-stylesheets-to-inline-styles/
https://github.com/crcn/emailify
https://github.com/peterbe/premailer
Porting code from either of the sources to PHP, or using any of the available APIs should do the trick of getting your CSS styling inline.
See the CssToInlineStyles project which does exactly what you want.
No, but try this instead, copying and pasting from word into http://ckeditor.com/ or tinymce, etc does clean it up A LOT, thought it's still not perfect it will get you much closer.
I finally managed to get it to work. The code is based off of
http://blog.verkoyen.eu/blog/p/detail/convert-css-to-inline-styles-with-php
with once simple change:
Moving the line
// add new properties into the list
foreach($rule['properties'] as $key => $value) $properties[$key] = $value;
up to the begining of the loop, right after where $properties is declared.
To make this work for WordPress however, one additional change is needed. DomDocument replace &nbps; from the document with blanks, which breaks WordPress update statement and lead to cotent being cut off. Please refer to my other question for the solution:
DOMDocument->saveHTML() converting to space
This problem is detailed in https://wordpress.stackexchange.com/questions/48692/post-content-getting-cut-off-on-blank-space-on-wpdb-update. If you know why this is happening for WordPress, please post your answer there as I would very much like to find out why it is happening.