PHPQuery - get all links of contains specific url page - php

I am trying to get all links of contains specific url page on a given page using PHPQuery. I am using the PHP support syntax of PHPQuery.
include_once 'phpQuery.php';
$url = 'http://www.phonearena.com/phones/manufacturer/';
$doc = phpQuery::newDocumentFile($url);
$urls = $doc['a'];
foreach ($urls as $url) {
echo pq($url)->attr('href') . '<br>';
}
The code above works . But it shows all the links
I want to show only those containing "/phones/manufacturer/".
I tried this but it shows nothing:
include_once 'phpQuery.php';
$url = 'http://www.phonearena.com/phones/manufacturer/';
$doc = phpQuery::newDocumentFile($url);
$urls = $doc['a'];
foreach ($urls as $url) {
echo pq($url)->attr('href:contains("/phones/manufacturer/")') . '<br>';
}

Use below coding get all urls from that site,
$doc = new DOMDocument();
#$doc->loadHTML(file_get_contents('http://www.phonearena.com/phones/manufacturer/'));
$ahreftags = $doc->getElementsByTagName('a');
foreach ($ahreftags as $tag) {
echo "<br/>";
echo $tag->getAttribute('href');
echo "<br/>";
}
exit;

Try this, a little italian guide, jquery documentation
include_once 'phpQuery.php';
$url = 'http://www.phonearena.com/phones/manufacturer/';
$doc = phpQuery::newDocumentFile($url);
$urls = $doc['a[href*="/phones/manufacturer/"]'];
foreach ($urls as $url) {
echo pq($url)->attr('href') . '<br>';
}

Related

Use 'foreach' in script

I'm trying to write a script that list all the image URL from an specific URL. I used foreach in order to scan several pages, but I think os not working well.
This is my code:
<?php
include('simple_html_dom.php');
$array = array($page, $page2);
$page = "https://www.dllo.dev";
$page2 = "https://www.dllo2.dev";
$html = new simple_html_dom();
$html->load_file($array);
$images = array();
foreach($html->find('img') as $element) {
$images[] = $element->src;
}
reset($images);
echo "URL $array:<br /><br />";
foreach ($images as $out) {
$url = "$base$out";
echo "$url,&nbsp";
}
It's partially working, but only with the first URL ($page)... Any idea?
:D

how to find urls under double quote

let's say we load the source code of this question and we want to find the url alongside "childUrl"
or goto this site source code and search "childUrl".
<?php
$sites_html = file_get_contents("https://stackoverflow.com/questions/46272862/how-to-find-urls-under-double-quote");
$html = new DOMDocument();
#$html->loadHTML($sites_html);
foreach() {
# now i want here to echo the link alongside "childUrl"
}
?>
Try this
<?php
function extract($url){
$sites_html = file_get_contents("$url");
$html = new DOMDocument();
$$html->loadHTML($sites_html);
foreach ($html->loadHTML($sites_html) as $row)
{
if($row=="wanted_url")
{
echo $row;
}
}
}
?>
you can use regex:
try this code
$matches = [[],[]];
preg_match_all('/\"wanted_url\": \"([^\"]*?)\"/', $sites_html, $matches);
foreach($matches[1] as $match) {
echo $match;
}
this will print all urls with wanted_url tag

How to find this url in the html with php?

I want to find a specific url in a html page and get it's part.
the url is in this page :
http://site1.com/games/arcade/139173-angry-birds-friends-1-7-0.html`
and is like
http://download.site2.org/?server=2&apkid=com.rovio.angrybirdsfriends&ver=1.7.0
I want 3 parts of it:
2
com.rovio.angrybirdsfriends
1.7.0
My code:
$html = file_get_contents("http://site1.com/games/name/139173-angry-birds-friends-1-7-0.html");
preg_match("/download(.*)/", $html, $results)
echo = $results[0];
Is this what you are looking for?
$url = 'http://download.site2.org/?server=2&apkid=com.rovio.angrybirdsfriends&ver=1.7.0';
$query = parse_url($url, PHP_URL_QUERY);
parse_str($query, $params);
echo $params['server'], PHP_EOL;
echo $params['apkid'], PHP_EOL;
echo $params['ver'], PHP_EOL;
Output:
2
com.rovio.angrybirdsfriends
1.7.0
Update
// Read HTML
$html = file_get_contents(
'http://getandroidapp.org/games/arcade/'
. '139173-angry-birds-friends-1-7-0.html'
);
// Turn HTML into a DOM document
$dom = new DOMDocument();
#$dom->loadHTML($html); // Mute warnings
// Find anchor ...
foreach ($dom->getElementsByTagName('a') as $link) {
$href = $link->getAttribute('href');
// ... having a query part that starts with 'server='
if (preg_match('#\?server=#', $href)) {
$url = $href;
// Parse query string from href
$query = parse_url($url, PHP_URL_QUERY);
parse_str($query, $params);
// Display values
echo $params['server'], PHP_EOL;
echo $params['apkid'], PHP_EOL;
echo $params['ver'], PHP_EOL;
// One is enough
break;
}
}
Output:
2
com.rovio.angrybirdsfriends
1.7.0
It is not totally fool-proof but maybe good enough in your case.

PHP multiple file_get_contents on data of previous file_get_contents

I found this code to check for links on an URL.
<?php
$url = "http://example.com";
$input = #file_get_contents($url);
$dom = new DOMDocument();
$dom->strictErrorChecking = false;
#$dom->loadHTML($input);
$links = $dom->getElementsByTagName('a');
foreach($links as $link) {
if ($link->hasAttribute('href')) {
$href = $link->getAttribute('href');
if (stripos($href, 'shows') !== false) {
echo "<p>http://example.com" . $href . "</p>\n";
}
}
}
?>
Works good, it shows all the links that contains 'shows'.
For example the script above find 3 links, so i get:
<p>http://example.com/shows/Link1</p>
<p>http://example.com/shows/Link2</p>
<p>http://example.com/shows/Link3</p>
Now the thing i try to do is to check those urls i just fetched also for links that contains 'shows'.
To be honest i'm a php noob, so i don't know where to start :(
Regards,
Bart
Something like:
function checklinks($url){
$input = #file_get_contents($url);
$dom = new DOMDocument();
$dom->strictErrorChecking = false;
#$dom->loadHTML($input);
$links = $dom->getElementsByTagName('a');
foreach($links as $link) {
if ($link->hasAttribute('href')) {
$href = $link->getAttribute('href');
if (stripos($href, 'shows') !== false) {
echo "<p>" . $url . "/" . $href . "</p>\n";
checklinks($url . "/" . $href);
}
}
}
}
$url = "http://example.com";
checklinks($url);
Make it recursive - call the function again in the function itself.

Adding referrals to links

I'm trying to dynamically add a link to the beginning of all the links in an RSS feed.
So far I have this which looks to me like it should work. What am I missing here?
<?php
$id = $_GET['id'];
$url = $_GET['url'];
$xml = new DOMDocument();
$xml->load("$url");
foreach($xml->getElementsByTagName('a') as $link) {
$link->setAttribute('href', 'http://$id.refsite/url/' . $link->getAttribute('href'));
}
echo $xml->saveXML();
?>
edit : .. this section doesn't appear to be doing anything
foreach($xml->getElementsByTagName('a') as $link) {
$link->setAttribute('href', 'http://$id.refsite/url/' . $link->getAttribute('href'));
}
try to use removeAttribute and after setAttribute the href like :
$get_url = $link->getAttribute('href');
$newURL= "http://$id.refsite/url/".$get_url;
//remove and set href attribute
$link->removeAttribute('href');
$link->setAttribute("href", $newURL);
Just answered my own question.
This is what I was trying to do
<?php
$id = $_GET['id'];
$url = $_GET['url'];
$page = file_get_contents("$url");
$pagefixed = str_replace("http://","http://$id.refsite/url/","$page");
echo $pagefixed;
?>
sometimes you just have a moment, lol

Categories