Scrape link in web page with specific class php [duplicate] - php

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 3 years ago.
as the title suggests, I would like to retrieve links that have a specific class.
I have the code to connect to the pages and with the preg_match function I would like to take only the url that is in href = "url".
the structure of the link I would like to take and the one found in href = "", this link is in a table and can also have other attributes but not id, only the view class.
<a title="viwe" class="view" href="link">blablabla</a>
while I wrote this code
$curl = curl_init('http://prove/prove/pag/test.php');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$page = curl_exec($curl);
if(curl_errno($curl)) // check for execution errors
{
echo 'Scraper error: ' . curl_error($curl);
exit;
}
curl_close($curl);
$regex = '/<a.*?>(.*?)<\/a>/';
if ( preg_match($regex, $page, $list) )
echo $list[0];
else
print "Not found";

Well, the method you are trying to implement is not recommended, yet if you have to, this expression might be closer to what you have in mind, that I'm guessing:
<a\s.*?\sclass="\s*view\s*"[^>]*>.*?<\/a>
Demo

Related

How to get a object from JSON to PHP [duplicate]

This question already has answers here:
How to access object properties with names like integers or invalid property names?
(7 answers)
Closed 2 years ago.
I need to take a img link (https://upload.wikimedia.org/wikipedia/en/5/51/Minecraft_cover.png) from this api https://en.wikipedia.org/w/api.php?action=query&prop=pageimages&format=json&piprop=original&titles=Minecraft&pilicense=any. How to do it?
I wrote code like this, but I can print :
$img_url = "https://en.wikipedia.org/w/api.php?action=query&prop=pageimages&format=json&piprop=original&titles=Minecraft&pilicense=any";
$img_url = str_replace(" ", "%20", $img_url);
$img = json_decode(file_get_contents($img_url));
print_r ($img);
But how to print only img source?
The simplest way would be to use the following.
echo $img->query->pages->{'27815578'}->original->source;
Where 27815578 is the Page ID

PHP cURL same HTML element [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 8 years ago.
I want to output the third element from the same HTML element. I know how to output a div or span with an id or a class but I have no idea how to output a same HTML element. I thought it was something with p[1] but it doesn't work.
I know there is a lot of answered questions about it but it never explained how to output the same HTML element without a class or id.
website : http://localhost/
<p>example</p>
<p>example1</p> <!-- i want to take this one -->
<p>example2</p>
-------------------
php script :
<?php $curl = curl_init('http://localhost/');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$code3 = curl_exec($curl);
curl_close($curl);
$code = '/<p>(.*?)<\/p>/s';
$code6= preg_match($code, $code3, $code4);
echo $code4[1]
?>
/* doesn't work ..
also php.net doesnt give a good example about it so i hope someone can help me here.
thanks advanced !
*/
Try:
<?php
$curl = curl_init('http://justpaste.it/8s5v');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$content = curl_exec($curl);
curl_close($curl);
preg_match_all('/<p>(.+)<\/p>/im', $content, $result);
print_r($result);
print "Selected: " . $result[0][1] . "\n";
?>

preg_match_all matches unexpectedly [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 9 years ago.
I've just started PHP and I want to scrape a little page which I can't, I tried doing 'PREG_MATCH_ALL' but it just doesn't get the result I want.. Basically I want to scrape the youtube video links from here only: https://gdata.youtube.com/feeds/api/standardfeeds/most_shared - Scrape all of them and then use them later.
I tried using the following code which failed;
<?php
$data = file_get_contents('https://gdata.youtube.com/feeds/api/standardfeeds/most_shared');
preg_match_all("/src='(.+?)'>/", $data, $links);
$link_out = $links[0][0];
echo $link_out;
?>
I'm new to PHP, so little help please.
Thanks
As the feed is XML, you can use PHP's SimpleXMLElement to obtain the data.
<?php
$xml = new SimpleXMLElement(
'https://gdata.youtube.com/feeds/api/standardfeeds/most_shared',
null,
true
);
foreach($xml->entry as $entry) {
echo $entry->content['src'], PHP_EOL;
}
/*
https://www.youtube.com/v/IjWc43FCYlg?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/Xw1C5T-fH2Y?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/Kq0_dGKx4Os?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/gbcBYs0ljI0?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/78juOpTM3tE?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/OOiZ-5DqwYI?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/zjz614QVyfQ?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/h15m87WsCHQ?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/SXKOTdyOUBg?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/BRAM8MpqIeA?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/5yB3n9fu-rM?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/NAOo9SnzRH8?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/0KtILkzC-1g?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/kWSIFh8ICaA?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/Mi6AhogZCeg?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/kWuIGAZ1x2I?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/lKY5fmDGVLs?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/C94PaCtqOk4?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/V-fL8zopddI?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/UWlzMIl7E48?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/mcw6j-QWGMo?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/-RSDaRttpzk?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/8_RDx4skTp4?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/7YDWdv9kR0M?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/m96tYpEk1Ao?version=3&f=standard&app=youtube_gdata
*/
Anthony.
Try with this pregmatch:
preg_match_all("/src='([^']+)'/si", $data, $links);
and show results:
echo "<pre>";
print_r($links);
<?php
$data = file_get_contents('https://gdata.youtube.com/feeds/api/standardfeeds/most_shared');
preg_match_all("/src='(.+?)'\/>/", $data, $links);
print_r($links[1]);
You forgot to match the closing / of the anchor tags.

how to extract a url's title, images, keywords and description using php? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Fastest way to retrieve a <title> in PHP
Suppose there is a website http://www.example.com with title = "Example" description = "it is an example" and keywords = "example, question, love php".
What will be the php code or any other code with which these can be fetched on submission of the link?
If you would like to fetch the data from an external link you want to get the page using curl.
Here's an example
<?php
$ch = curl_init("http://www.google.nl");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_close($ch);
$title = preg_match('!<title>(.*?)</title>!i', $result, $matches) ? $matches[1] : 'No title found';
$description = preg_match('!<meta name="description" content="(.*?)">!i', $result, $matches) ? $matches[1] : 'No meta description found';
$keywords = preg_match('!<meta name="keywords" content="(.*?)">!i', $result, $matches) ? $matches[1] : 'No meta keywords found';
echo $title . '<br>';
echo $description . '<br>';
echo $keywords . '<br>';
?>
For google.nl this returns:
Google
No meta description found
No meta keywords found
The meta description and keywords might need more tweaking for your use.
Little sidenote cURL is not installed by default in apache so you might need to install if first.
Here's the cURL php function page: http://nl3.php.net/manual/en/ref.curl.php
Your question is too general but the tool you will want is file_get_contents to get the page and then regex to find the data. You could use this to find the title easily but what exactly you mean by keywords and description are unclear...

PHP Get URL Contents And Search For String [duplicate]

This question already has answers here:
Get content from a url using php
(3 answers)
Closed 7 years ago.
In php I need to get the contents of a url (source) search for a string "maybe baby love you" and if it does not contain this then do x.
Just read the contents of the page as you would read a file. PHP does the connection stuff for you. Then just look for the string via regex or simple string comparison.
$url = 'http://my.url.com/';
$data = file_get_contents( $url );
if ( strpos( 'maybe baby love you', $data ) === false )
{
// do something
}
//The Answer No 3 Is good But a small Mistake in the function strpos() I have correction the code bellow.
$url = 'http://my.url.com/';
$data = file_get_contents( $url );
if ( strpos($data,'maybe baby love you' ) === false )
{
// do something
}
Assuming fopen URL Wrappers are on ...
$string = file_get_contents('http://example.com/file.html');
if(strpos ('maybe baby love you', $string) === false){
//do X
}
If fopen URL wrappers are not enabled, you may be able to use the curl module (see http://www.php.net/curl )
Curl also gives you the ability to deal with authenticated pages, redirects, etc.

Categories