Using PHP to scrape image url from twitter page - php

I'm trying to scrape an image url from twitter e.g. 'https://pbs.twimg.com/media/BGZHCHwCEAACJ19.jpg:large' using php. I have found the following php code and file_get_contents is working but I don't think the regurlar expression is matching the url. Can you help debug this code? Thanks in advance.
Here is a snippet from twitter which contains the image:
<div class="media-gallery-image-wrapper">
<img class="large media-slideshow-image" alt="" src="https://pbs.twimg.com/media/BGZHCHwCEAACJ19.jpg:large" height="480" width="358">
</div>
Here is the php code:
<?php
$url = 'http://t.co/s54fJgrzrG';
$twitter_page = file_get_contents($url);
preg_match('/(http:\/\/p.twimg.com\/[^:]+):/i', $twitter_page, $matches);
$imgURL = array_pop($matches);
echo $imgURL;
?>

Something like this should provide a URL.
<?php
$url = 'http://t.co/s54fJgrzrG';
$twitter_page = file_get_contents($url);
preg_match_all('!http[s]?:\/\/pbs\.twimg\.com\/[^:]+\.(jpg|png|gif)!i', $twitter_page,$matches);
echo $img_url=$matches[0][0];
?>
Response is
https://pbs.twimg.com/media/BGZHCHwCEAACJ19.jpg

It appears that your regular expression is missing part of the beginning of the URI. It was missing the 'pbs' part, and was not able to determine if http or https.
preg_match('/((http|https):\/\/pbs.twimg.com\/[^:]+):/i', $twitter_page, $matches);

Related

Retrieve data attribute value using preg_match

I have the following HTML:
<div class="video cover" data-thumb="https://i.vimeocdn.com/video/1234567.webp?mw=700&mh=393" style="background-image: url(https://i.vimeocdn.com/video/525930392.webp?mw=700&mh=393);">
I would like to retrieve the data-thumb URL value.
I have attempted to retrieve the value using the following:
$iframe = '<div class="video cover" data-thumb="https://i.vimeocdn.com/video/1234567.webp?mw=700&mh=393" style="background-image: url(https://i.vimeocdn.com/video/525930392.webp?mw=700&mh=393);">';
preg_match('/data-thumb="(.*?)"/', $iframe, $matches);
echo $matches[0];
However, this is not retrieving any matches.
EDIT: Thank you for your help and answers. It appears I made an error with the output of $iframe, which was displaying content from an iframe (doh). So preg_match couldn't target it.
preg_match('/data-thumb="([^"]*)"/', $iframe, $matches);
if (isset($matches[1]))
echo $matches[1]; // echo the value of the data-thumb attribute
This works for me. And indeed, original code works fine too.
Try this regex:
/data-thumb="[\s\S]*?"/

Get .mp4 source and poster image from Vine Id (PHP)

How to get video.mp4 from vine url?
Example:
from https://vine.co/v/hnVVW2uQ1Z9
I need http://.../*.mp4 and http://.../*.jpg
Script what I need use this page vinebed.com
(In PHP)
Thanks much.
It's very simple. if you check the source of a vine video from vine.co you'll see the meta tags. and you should see twitter:player:stream. By using php you can extract that information specifically and use it like a variable.
<?php
function vine( $id )
{
$vine = file_get_contents("http://vine.co/v/{$id}");
preg_match('/property="twitter:player:stream" content="(.*?)"/', $vine, $matches);
$url = $_SERVER['REQUEST_URI'];
return ($matches[1]) ? $matches[1] : false;
}
?>
And to set an $id you will need to create a function that will either A) Automatically read a vine video id by url and you can display it like this <?php echo vine('bv5ZeQjY35'); ?> or B) Just set a vine video id and display as is.
Hope this helps as it's worked for me just fine.

How do I call an iframe or something similar in PHP?

Hey how do I call an iframe or something similar in PHP?
I have found some code but I might be setting up wrong, this is the code that I found, code:
<iframe id="frame" src="load.php?sinput="<?php echo $_GET["sinput"]; ?> > </iframe>
Does anybody know any iframe PHP codes or something similar for PHP?
Some people are saying not to use iframes what is there from PHP?
There is no function to generate an iframe in PHP.
What you're doing is fine, but allow me to make a suggestion:
<?
$input = "";
if(isset($_GET['sinput'])) {
$input = htmlspecialchars($_GET['sinput']);
}
?>
<iframe id="frame" src="load.php?sinput="<?php echo $input; ?>">Your browser does not support iframes</iframe>
EDIT: actually
<?
$url = "load.php";
// Query Building Logic
$querys = array();
if(isset($_GET['sinput'])) {
$queries[] = "sinput=".htmlspecialchars($_GET['sinput']);
}
// Generate full URL
if(count($queries) > 0) {
$url .= "?" . implode("&", $queries);
}
?>
<iframe id="frame" src="<? echo $url; ?>">Your browser does not support iframes</iframe>
I think is better quality overall, but ill let that up to my peers to judge. This is just another suggestion, to generate the full usable URL to use in your HTML in a full logic block, rather than relying on information to be present and usable in the template (because if the element ['sinput'] in the $_GET array is not set for whatever reason, the page will outright snap on you.

Getting the first image in string with php

I'm trying to get the first image from each of my posts. This code below works great if I only have one image. But if I have more then one it gives me an image but not always the first.
I really only want the first image. A lot of times the second image is a next button
$texthtml = 'Who is Sara Bareilles on Sing Off<br>
<img alt="Sara" title="Sara" src="475993565.jpg"/><br>
<img alt="Sara" title="Sara two" src="475993434343434.jpg"/><br>';
preg_match_all('/<img.+src=[\'"]([^\'"]+)[\'"].*>/i', $texthtml, $matches);
$first_img = $matches [1] [0];
now I can take this "$first_img" and stick it in front of the short description
<img alt="Sara" title="Sara" src="<?php echo $first_img;?>"/>
If you only need the first source tag, preg_match should do instead of preg_match_all, does this work for you?
<?php
$texthtml = 'Who is Sara Bareilles on Sing Off<br>
<img alt="Sara" title="Sara" src="475993565.jpg"/><br>
<img alt="Sara" title="Sara two" src="475993434343434.jpg"/><br>';
preg_match('/<img.+src=[\'"](?P<src>.+?)[\'"].*>/i', $texthtml, $image);
echo $image['src'];
?>
Don't use regex to parse html.
Use an html-parsing lib/class, as phpquery:
require 'phpQuery-onefile.php';
$texthtml = 'Who is Sara Bareilles on Sing Off<br>
<img alt="Sarahehe" title="Saraxd" src="475993565.jpg"/><br>
<img alt="Sara" title="Sara two" src="475993434343434.jpg"/><br>';
$pq = phpQuery::newDocumentHTML($texthtml);
$img = $pq->find('img:first');
$src = $img->attr('src');
echo "<img alt='foo' title='baa' src='{$src}'>";
Download: http://code.google.com/p/phpquery/
After testing an answer from here Using regular expressions to extract the first image source from html codes? I got better results with less broken link images than the answer provided here.
While regular expressions can be good for a large variety of tasks, I find it usually falls short when parsing HTML DOM. The problem with HTML is that the structure of your document is so variable that it is hard to accurately (and by accurately I mean 100% success rate with no false positive) extract a tag.
For more consistent results use this object http://simplehtmldom.sourceforge.net/ which allows you to manipulate html.
An example is provided in the response in the first link I posted.
function get_first_image($html){
require_once('SimpleHTML.class.php')
$post_html = str_get_html($html);
$first_img = $post_html->find('img', 0);
if($first_img !== null) {
return $first_img->src';
}
return null;
}
Enjoy

problem with string manipulation function retrieving URL

i build a simple scraper to get me links from other website
my problem now is to getting the link it self not all of the content
<a onclick="javascript:_gaq.push(['_trackEvent','outbound-article','namobile.naughtyamerica.com']);" href="http://www.wwww.com/track/MTA3ODQxLjEyLjQwLjQwLjAuMC4wLjAuMA/freeporn3/lisa_ann6/7535/"><img class="aligncenter size-full" title="Lisa Ann" src="http://www.www.com/upload/source/mfhm/lisawill/lisawillhor_gmna_big_img3.jpg" alt="Lisa Ann" width="313" height="223" /></a>
here the image and its link i need to get the link only in a variable to be like that
$url = "http://www.wwww.com/track/MTA3ODQxLjEyLjQwLjQwLjAuMC4wLjAuMA/freeporn3/lisa_ann6/7535/";
that its it thank you
Use queryPath, Simple HTML DOM Parser or other PHP libraries for navigating in DOM document
You can use PHP Query library, and attr method if you are familiar with CSS selectors.
<?php
echo pq('a')->attr('href');
$html = <<< EOF
<a onclick="javascript:_gaq.push(['_trackEvent','outbound-article','namobile.naughtyamerica.com']);" href="http://www.wwww.com/track/MTA3ODQxLjEyLjQwLjQwLjAuMC4wLjAuMA/freeporn3/lisa_ann6/7535/"><img class="aligncenter size-full" title="Lisa Ann" src="http://www.www.com/upload/source/mfhm/lisawill/lisawillhor_gmna_big_img3.jpg" alt="Lisa Ann" width="313" height="223" /></a>
EOF;
preg_match_all('/<a onclick.*?href="(.*?)"/im', $html, $url, PREG_PATTERN_ORDER);
$url = $url[1][0];
echo $url // echo's "http://www.wwww.com/track/MTA3ODQxLjEyLjQwLjQwLjAuMC4wLjAuMA/freeporn3/lisa_ann6/7535/"

Categories