symfony domcrawler parsing not working - php

i would like to grab multiple of values from the following html:
<div class="video">
<a href="https://example.com/23422" class="hRotator">
<div class="thumb_container" data-previewvideo="http://example.com/vid.mp4">
<img src="http://example.com/thumb.jpg" class="thumb" alt="">
<img class="hSprite" src="https://example.com/spacer.gif" sprite="https://example.com/23422.jpg" id="23422">
<video autoplay="autoplay" loop="loop" muted="muted" playsinline="" webkit-playsinline="" poster="https://example.com/poster.jpg" src="https://example.com/23422.mp4"></video>
</div>
</div>
</a>
</div>
I'm using symfony domparser but dont seem to get it right
$crawler->filter('div .videoList')->first()->filter('div .video')->each(function($video) {
$link = $video->filter("a");
$href = $link->attr("href");
$thumb_container = $link->filter("div .thumb_container");
$preview_video = $thumb_container->attr("data-previewvideo");
$thumbnail_image = $thumb_container->filter("img .thumb")->attr("src");
$hSprite = $thumb_container->filter("img .hSprite")->first();
$image_sprite = $hSprite->attr("sprite");
$id = $hSprite->attr("id");
}
How i should parse the html?

Related

How to get data attribute value?

I have a url within a data-attribute and I need to get the first one:
<div class="carousel-cell">
<img onerror="this.parentNode.removeChild(this)"; class="carousel-cell-image" data-flickity-lazyload="http://esportareinsvizzera.com/site/wp-content/uploads/8.jpg">
</div>
<div class="carousel-cell">
<img onerror="this.parentNode.removeChild(this);" class="carousel-cell-image" data-flickity-lazyload="http://www.finanziamentiprestitimutui.com/wp-content/uploads/2014/09/esportazioni-finanziamento-credito.jpg">
</div>
<div class="carousel-cell">
<img onerror="this.parentNode.removeChild(this);" class="carousel-cell-image" data-flickity-lazyload="http://www.infologis.biz/wp-content/uploads/2013/09/Export.jpg">
</div>
<div class="carousel-cell">
<img onerror="this.parentNode.removeChild(this);" class="carousel-cell-image" data-flickity-lazyload="http://www.cigarettespedia.com/images/2/25/Esportazione_horizontal_name_ks_20_s_green_italy.jpg">
</div>
I have been reading lots of answers like this one and this one but I am not a php guy.
I was using this to get the first img but now I need the actual data attribute value instead
<?php
$custom_image = usp_get_meta(false, 'usp-custom-4');
$custom_image = htmlspecialchars_decode($custom_image);
$custom_image = nl2br($custom_image);
$custom_image = preg_replace('/<br \/>/iU', '', $custom_image);
preg_match('/<img.+src=[\'"](?P<src>.+?)[\'"].*>/i',$custom_image, $image);
?>
<img src="<?php echo $image['src']; ?>" alt="<?php the_title(); ?>">
Use DOMDocument to parse the HTML, get the elements corresponding to img tags and get the data-flickity-lazyload attribute of the first img tag:
...
$DOM = new DOMDocument;
$DOM->loadHTML($custom_image);
$items = $DOM->getElementsByTagName('img');
$mySrc = $items->item(0)->getAttribute('data-flickity-lazyload');

Regular Expression or other way to get string in right format

Please help me out.. I have following string
<p>this is text before first image</p>
<p><img class="size-full wp-image-2178636" src="image1.jpg" alt="first" /> this is first caption</p>
<p>this is text before second image.</p>
<p><img src="image2.jpg" alt="second" class="size-full wp-image-2178838" /> this is second caption</p>
<p>there may be many more images</p>
and I need above string formatted as following :
<p>this is text before first image</p>
<a href="">
<figure>
<img class="size-full wp-image-2178636" src="image1.jpg" alt="first" />
<figcaption class="newcaption">
<h1>this is first caption</h1>
</figcaption>
</figure>
</a>
<p>this is text before second image.</p>
<a href="">
<figure>
<img class="size-full wp-image-2178636" src="image2.jpg" alt="first" />
<figcaption class="newcaption">
<h1>this is second caption</h1>
</figcaption>
</figure>
</a>
<p>there may be many more images</p>
Kindly help me.. how we can do that either by regular expressions or using other way. I am doing it using PHP.
Regards,
Sachin.
Although SO is not supposed to be a code-writing service here is a quick n' dirty solution that uses the DOMDocument-approach:
$html = '...'; // your input data
$input = new DOMDocument();
$input->loadHTML($html);
$ps = $input->getElementsByTagName('p');
$output = new DOMDocument();
$counter = 0;
foreach ($ps as $p) {
if ($counter%2 === 0) {
// text before image
$p_before_image = $output->createElement("p", $p->nodeValue);
$output->appendChild($p_before_image);
}
elseif ($p->hasChildNodes()) {
// image output routine
$as_input = $p->getElementsByTagName("a");
$a_output = $output->importNode($as_input->item(0));
$figure = $output->createElement("figure");
$imgs_input = $p->getElementsByTagName("img");
$img_output = $output->importNode($imgs_input->item(0));
$figure->appendChild($img_output);
$figcaption = $output->createElement("figcaption");
$figcaption->setAttribute("class", "newcaption");
$h1 = $output->createElement("h1", $p->nodeValue);
$figcaption->appendChild($h1);
$figure->appendChild($figcaption);
$a_output->appendChild($figure);
$output->appendChild($a_output);
}
else {
// Document malformed
}
$counter++;
}
print $output->saveHTML();
Note that saveHTML() will output plain old HTML. Thus, imgs won't be turned into self-closing tags. You may want to look into saveXML() if this is important to you.

Get img src inside an a href html dom parser

i am using the code bellow to get some data from an html with php simple html dom parser.
almost everything works great... the issue that i am facing is that i cant grab img src... my code is:
foreach($html->find('article') as $article) {
$item['title'] = $article->find('.post-title', 0)->plaintext;
$item['thumb'] = $article->find('.post-thumbnail', 0)->plaintext;
$item['details'] = $article->find('.entry p', 0)->plaintext;
echo "<strong>img url:</strong> " . $item['thumb'];
echo "</br>";
}
My Posts structure:
<article class="item-list item_1">
<h2 class="post-title">my demo post 1</h2>
<p class="post-meta">
<span class="tie-date">2 mins ago</span>
<span class="post-comments">
</span>
</p>
<div class="post-thumbnail">
<a href="http://localhost/mydemosite/category/sports/demo-post/" title="my demo post 1" rel="bookmark">
<img width="300" height="160" src="http://localhost/mydemosite/wp-content/uploads/demo-post-300x160.jpg" class="attachment-tie-large wp-post-image" alt="my demo post 1">
</a>
</div>
<!-- post-thumbnail /-->
<div class="entry">
<p>Hello world... this is a demo post description, so if you want to read more...</p>
<a class="more-link" href="http://localhost/mydemosite/category/sports/demo-post">Read More »</a>
</div>
<div class="clear"></div>
</article>
When you use .post-thumbnail you are getting the div element.
To get the src of the img element, use this:
$item['imgurl'] = $article->find('.post-thumbnail img', 0)->src;
I added the img selector and outputing the src directly into the variable.

Regarding PHP cut string pattern

Below is my string
</div>
<div class="centered">Thanks for visiting</div>
<div id="related-videos">
<div class="generic-video-item">
<div class="thumb"><img src="http://watsite.yt/thumbnail.php?id=648p14jpgkgj" alt="" class="bg-image" /><span class="border"></span><span class="now-playing"></span><span class="video-subbed">subbed</span> <img src="http://static.cdn.animeultima.tv/images/star-trusted.png" alt="Trusted uploader" title="Trusted uploader" class="trusted" /></div>
watsite video by Argro<br /><span class="time">1 hour ago</span>
</div>
<div class="generic-video-item">
<div class="thumb"><a rel="nofollow" href="/Seitokai-Yakuindomo-2-episode-7-english-subbed-video-mirror-725129-watsite/"><img src="http://watsite.yt/thumbnail.php?id=4055g2gpbt2i" alt="" class="bg-image" /><span class="border"></span><span class="play"></span><span class="video-subbed">subbed</span> <img src="http://static.cdn.animeultima.tv/images/star-trusted.png" alt="Trusted uploader" title="Trusted uploader" class="trusted" /></a></div>
watsite video by Argro<br /><span class="time">1 hour ago</span>
</div>
<div class="generic-video-item">
<div class="thumb"><a rel="nofollow" href="/Seitokai-Yakuindomo-2-episode-7-english-subbed-video-mirror-725130-FLVUpload/"><img src="http://www.ragnaultima.com/mp4up.php?id=c56vy8likuy8" alt="" class="bg-image" /><span class="border"></span><span class="play"></span><span class="video-subbed">subbed</span> <img src="http://static.cdn.animeultima.tv/images/star-trusted.png" alt="Trusted uploader" title="Trusted uploader" class="trusted" /></a></div>
FLVUpload video by Argro<br /><span class="time">1 hour ago</span>
</div>
<div class="clear"></div>
</div>
<div class="centered">
<script language="JavaScript" type="text/javascript">
I am trying to cut out this url
/Seitokai-Yakuindomo-2-episode-7-english-subbed-video-mirror-725130-FLVUpload/
Currently I am using the following
$url = inbtwn($newData,'rel="nofollow" href="','-FLVUpload/">');
function inbtwn($input, $startcut, $finishcut){
$a1 = split($startcut, $input);
$a2 = split($finishcut, $a1[1]);
$output = $a2[0];
return $output;
}
But it return me the result with watsite, how do I obtain this /Seitokai-Yakuindomo-2-episode-7-english-subbed-video-mirror-725130-FLVUpload/ from the chunk of string above .
Thanks for helping
parse_url() is useful for you.
$url ='http://google.com/wrwetfrtgertger/';
$tmp = parse_url($url);
echo $tmp['path'];
or if up code not working.
$url ='http://google.com/wrwetfrtgertger/';
$tmp = parse_url($url);
echo $url = str_replace('http://'.$tmp['host'] ,'',$url);
Try using regex for a quick and dirty way
$regex = '/href\\s*=\\s*"([^"]*-FLVUpload\/)/s';
if (preg_match_all($regex, $newData, $matches_out)) {
$url = $matches_out[1][0];
print($url);
} else {
print('URL not found');
}

how to change image URL for each image of 4 products on each row?

I have a dynamic product table with 4 product on each row. I'm using CSS and not an html table.
I'm looking for a way to change all 4 images on each row to different urls and to do the same on all the other rows.
The reason for this is to use 4 sub domains as CDN to allow faster downloads.
Is this possible? i'm still very junior so need some assistance.
Below is my code and the image section is <img class="lazy" src="/images/loading.gif" data-original="<?=resize($i['image'],$settings)?>" width="170" height="250" alt="" />
You will notice that i'm using data-original as i'm using lazyload, the $settings is used for creating a cached version of the image.
Here's my code...
if($viewing=='retailer'){
if($i['category']!=$categoryCheck){?>
<div id="sub-sub"><?=$i['category_name']?></div>
<?
$categoryCheck = $i['category']; $y=1;
}?><? } ?>
<div class="package"<?=$y==4?' style="margin-right:0;"':''?><?=$y==1?' style="clear:left;"':''?>>
<div class="package-img"><a rel="nofollow" target="_blank" href="<?=$buyLink?>">
<?php $settings = array('w'=>170,'h'=>250,'canvas-color'=>'#ffffff'); ?>
<img class="lazy" src="/images/loading.gif" data-original="<?=resize($i['image'],$settings)?>" width="170" height="250" alt="" />
<noscript><img src="<?=resize($i['image'],$settings)?>" width="640" heigh="480"></noscript>
<? /* <img src="<?=$i['image']?>" width="640" heigh="480"> */ ?>
</a></div>
<div class="name"><a rel="nofollow" target="_blank" href="<?=$buyLink?>"><?=$i['item_name']?></a></div>
<div class="price"><p>£<?=$i['price']?></div>
<div class="mrtl rtl<?=$i['retailer']?>"></div>
<div class="retailer-image"><img src="/images/retailers/<?=$i['retailer_logo']?>" width="140" heigh="46" /></div>
</div>
<?
$y = $y==4 ? 1 : $y+1;
}
You need to change your function which returns the path to the image.
<?php
function resize(..., ...) {
static $i = 0;
$i++;
if ($i == 5) {
$i = 1;
}
return "http://cdn{$i}.domain/images/blah.gif";
}
?>

Categories