Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a function (listarUrls ()) that returns / scans all the urls it finds on a web page.
I need that for each of the urls that the function returns to me, I return to the list / scan all the urls of that page
many times as requested by the user, that is
.If the user asks for 1 iteration of the url www.a.com, bring back:
-$arry[0] www.1.com
-$arry[1] www.2.com
-..... So with all the urls you find in www.a.com
.If the user asks for 2 iteration of the url www.a.com, bring back:
-$arry[0] www.1.com
-$arry[0][0] www.1-1.com
-$arry[0][1] www.1-2.com
-...So with all the urls you find in www.1.com
-$arry[1] www.2.com
-$arry[1][0] www.2-1.com
-$arry[1][1] www.2-2.com
-...So with all the urls you find in www.2.com
-...
.If the user asks for 3 iteration of the url www.a.com, bring back:
-$arry[0] www.1.com
-$arry[0][0] www.1-1.com
-$arry[0][0][0] www.1-1-1.com
-$arry[0][0][1] www.1-1-2.com
-...So with all the urls you find in www.1-1.com
-$arry[0][1] www.1-2.com
-$arry[0][1][0] www.1-2-1.com
-$arry[0][1][1] www.1-2-2.com
-...So with all the urls you find in www.1-2.com
-$arry[1] www.2.com
-$arry[1][0] www.2-1.com
-$arry[1][0][0] www.2-1-1.com
-$arry[1][0][1] www.2-1-2.com
-...So with all the urls you find in www.2-1.com
-$arry[1][1] www.2-2.com
-$arry[1][1][0] www.2-2-1.com
-$arry[1][1][1] www.2-2-2.com
-...So with all the urls you find in www.2-2.com
-...
Could someone shed some light on the subject please?
This is web scraping with the option to instruct how much deep to investigate.
We can have a function definition like below:
function scrapeURLs($url,$steps,&$visited_urls = []);
Here, $url is the current URL we are scraping. $steps is which step we are investigating. If $steps == 1 at any point in our recursive function, we stop scraping further. $visited_urls is to make sure we aren't visiting same URL twice for scraping.
Snippet:
<?php
ini_set('max_execution_time','500');
libxml_use_internal_errors(true); // not recommended but fine for debugging. Make sure HTML of the URL follows DOMDocument requirements
function scrapeURLs($url,$steps,&$visited_urls = []){
$result = [];
if(preg_match('/^http(s)?:\/\/.+/',$url) === 0){ // if not a proper URL, we stop here, but will have to double check if it's a relative URL and do some modifications to current script
return $result;
}
$dom = new DOMDocument();
$dom->loadHTMLFile($url);
// get all script tags
foreach($dom->getElementsByTagName('script') as $script_tag){
$script_url = $script_tag->getAttribute('src');
if(!isset($visited_urls[$script_url])){
$visited_urls[$script_url] = true;
$result[$script_url] = $steps === 1 ? [] : scrapeURLs($script_url,$steps - 1,$visited_urls); // stop or recurse further
}
}
// get all anchor tags
foreach($dom->getElementsByTagName('a') as $anchor_tag){
$anchor_url = $anchor_tag->getAttribute('href');
if(!isset($visited_urls[$anchor_url])){
$visited_urls[$anchor_url] = true;
$result[$anchor_url] = $steps === 1 ? [] : scrapeURLs($anchor_url,$steps - 1,$visited_urls);
// stop or recurse further
}
}
/* Likewise, you can capture several other URLs like CSS stylesheets, image URLs etc*/
return $result;
}
print_r(scrapeURLs('http://yoursite.com/',2));
array_walk_recursive — Apply a user function recursively to every member of an array
https://www.php.net/manual/en/function.array-walk-recursive.php
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Hello everyone again here,
I want to create a PHP script for my software which generates and returns the specific code using one $_GET request with a string and using another verificates this code, then forbid running same string.
Something what should work like this:
1st user's software runs "http://example.com/codes.php?create=" and string like "abc".
and script returns code based on "abc", e.g. "4aO45k", "12sdF4" etc.
2nd user's software runs "http://example.com/codes.php?verify=" and this code.
If this code exists, return true and remove it FOREVER, meaning this code will never be generated again. If this code doesn't exist, return false.
If 1st user's software will run "http://example.com/codes.php?create=abc" another code will be generated.
In simple words:
if $_GET is create, then
generate random alphanumeric string, save it and return
if $_GET is verify, then
check if this string exists, if so, then
return true, remove from saved
otherwise
return false
Possible without databases, SQL, mySQL, FireBird...?
How do I make it using .ini files as storage?
Thanks.
It's possible with files. You can do something like the simple solution below:
A couple of notes:
I don't know what you intend by based on exactly, so this just uses the input as a prefix
This stores every code in a file indefinitely; if used a lot this file will grow very large and checking for the existence of codes, and ensuring new codes are unique can grow very slow
The same code can be verified multiple times, but will never be recreated. Marking them as used after verification is of course possible as well
As a general rule don't go creating global functions and shoving everything in one file like this. It's really just proof of concept of what was asked
<?php
$file = fopen('codes', 'a');
if (!empty($_GET['create'])) {
$seed = $_GET['create'];
do {
$code = uniqid($seed);
} while (codeExists($code));
fwrite($file, $code . "\n");
echo $code;
}
else if (!empty($_GET['verify'])) {
echo codeExists($_GET['verify']) ? 'found' : 'not found';
}
function codeExists($verification) {
$file = fopen('codes', 'r');
$found = false;
while ($code = trim(fgets($file))) {
if ($code == $verification) {
$found = true;
break;
}
}
return $found;
}
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm trying to access the URL of the video element from an external URL.
That's an example of the url I'm trying to access:
https://www.musical.ly/v/MzA4NTExODI2MDI0MjMzNDgxOTEyMzI.html
file_get_contents and curl return an html code without the video in it, what am I doing wrong?
Any PHP/jQuery solution would be great!
It seems like the MzA4NTExODI2MDI0MjMzNDgxOTEyMzI-part of the url is the video key.
They are calling: https://www.musical.ly/rest/v2/musicals/shareInfo?key=MzA4NTExODI2MDI0MjMzNDgxOTEyMzI to fetch the video information in json format.
You could do the same and just use the videoUri from the json response?
Example
Just for fun, I created an example how to fetch it from the initial URL. This would of course need a bit of validation and such, but it is a working example:
<?php
$url = 'https://www.musical.ly/v/MzA4NTExODI2MDI0MjMzNDgxOTEyMzI.html';
// Extract the url path and explode the segments
$segments = explode('/', parse_url($url, PHP_URL_PATH));
if (isset($segments[2])) {
// We have the key segment so let's build the URL to fetch the info
$infoUrl = 'https://www.musical.ly/rest/v2/musicals/shareInfo?key=' . rtrim($segments[2], '.html');
$info = file_get_contents($infoUrl);
$info = $info ? json_decode($info, true) : null;
}
if (isset($info['result'], $info['result']['videoUri'])) {
// We have all we need, let's get the video uri
echo $info['result']['videoUri'];
} else {
die('No video URI found');
}
URL : http://www.sayuri.co.jp/used-cars
Example : http://www.sayuri.co.jp/used-cars/B37753-Toyota-Wish-japanese-used-cars
Hey guys , need some help with one of my personal projects , I've already wrote the code to fetch data from each single car url (example) and post on my site
Now i need to go through the main url : sayuri.co.jp/used-cars , and :
1) Make an array / list / nodes of all the urls for all the single cars in it , then run my internal code for each one to fetch data , then move on to the next one
I already have the code to save each url into a log file when completed (don't think it will be necessary if it goes link by link without starting from the top but will ensure no repetition.
2) When all links are done for the page , it should move to the next page and do the same thing until the end ( there are 5-6 pages max )
I've been stuck on this part since last night and would really appreciate any help . Thanks
My code to get data from the main url :
$content = file_get_contents('http://www.sayuri.co.jp/used-cars/');
// echo $content;
and
$dom = new DOMDocument;
$dom->loadHTML($content);
//echo $dom;
I'm guessing you already know this since you say you've gotten data from the car entries themselves, but a good point to start is by dissecting the page's DOM and seeing if there are any elements you can use to jump around quickly. Most browsers have page inspection tools to help with this.
In this case, <div id="content"> serves nicely. You'll note it contains a collection of tables with the required links and a <div> that contains the text telling us how many pages there are.
Disclaimer, but it's been years since I've done PHP and I have not tested this, so it is probably neither correct or optimal, but it should get you started. You'll need to tie the functions together (what's the fun in me doing it?) to achieve what you want, but these should grab the data required.
You'll be working with the DOM on each page, so a convenience to grab the DOMDocument:
function get_page_document($index) {
$content = file_get_contents("http://www.sayuri.co.jp/used-cars/page:{$index}");
$document = new DOMDocument;
$document->loadHTML($content);
return $document;
}
You need to know how many pages there are in total in order to iterate over them, so grab it:
function get_page_count($document) {
$content = $document->getElementById('content');
$count_div = $content->childNodes->item($content->childNodes->length - 4);
$count_text = $count_div->firstChild->textContent;
if (preg_match('/Page \d+ of (\d+)/', $count_text, $matches) === 1) {
return $matches[1];
}
return -1;
}
It's a bit ugly, but the links are available inside each <table> in the contents container. Rip 'em out and push them in an array. If you use the link itself as the key, there is no concern for duplicates as they'll just rewrite over the same key-value.
function get_page_links($document) {
$content = $document->getElementById('content');
$tables = $content->getElementsByTagName('table');
$links = array();
foreach ($tables as $table) {
if ($table->getAttribute('class') === 'itemlist-table') {
// table > tbody > tr > td > a
$link = $table->firstChild->firstChild->firstChild->firstChild->getAttribute('href');
// No duplicates because they just overwrite the same entry.
$links[$link] = "http://www.sayuri.co.jp{$link}";
}
}
return $links;
}
Perhaps also obvious, but these will break if this site changes their formatting. You'd be better off asking if they have a REST API or some such available for long term use, though I'm guessing you don't care as much if it's just a personal project for tinkering.
Hope it helps prod you in the right direction.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
While($enreg=mysql_fetch_array($res))
{
$link_d.="<font color=\"red\">clic here to download</font></td>"
}
i want to use the href so it leads to download link, also to send the id to a php file so i can get how many times the files have been downloaded !
How can we use href to multiple links !
You can't. A link can only point to one resource.
Instead, what you should do is have your PHP script redirect to the file. The link points at your PHP script with the counter, and then set a Location: header (which automatically sets a 302 status code for redirection) with the value being the URL you want to redirect to.
Also, you should really use htmlspecialchars() around any variable data you use in an HTML context, to ensure you are generating valid HTML.
Ideally you would have some checks to see if it's a human downloading (Web crawlers may trigger it - we will put no-follow in the link which will help though). You could also use a database but that gets more complicated. My preferred way would be to use Google Analytics Events. But here is a simple PHP script that might fulfill your needs without the complexity of the other solutions.
First modify your links to have a tracker script and to urlencode
$link_d.= '<a style="color:red" href="tracker.php?url='.urlencode($enreg[link]).'" target="_blank">click here to download</a>';
}
Then create a script that will record downloads (tracker.php)
<?php
// keep stats in a file - you can change the path to have it be below server root
// or just use a secret name - must be writeable by server
$statsfile = 'stats.txt';
// only do something if there is a url
if(isset($_GET['url'])) {
//decode it
$url = urldecode($_GET['url']);
// Do whatever check you want here to see if it's a valud link - you can add a regex for a URL for example
if(strpos($url, 'http') != -1) {
// load current data into an array
$lines = file($statsfile);
// parse array into something useable by php
$stats = array();
foreach($lines as $line) {
$bits = explode('|', $line);
$stats[(string)$bits[0]] = (int)$bits[1];
}
// see if there is an entry already
if(!isset($stats[$url])) {
// no so let's add it with a count of 1
$stats[$url] = 1;
} else {
// yes let's increment
$stats[$url]++;
}
// get a blank string to write to file
$data = null;
// get our array into some human readabke format
foreach($stats as $url => $count) {
$data .= $url.'|'.$count."\n";
}
// and write to file
file_put_contents($statsfile, $data);
}
// now redirect to file
header('Location: ' . $url);
}
You can't.
Anchor are meant to lead to one ressource.
What you want to do is tipically addressed by using an intermediate script that count the hit and redirect to the ressource.
eg.
Click here to download
redirect.php
// Increment for example, a database :
// UPDATE downloads SET hits = (hits + 1) WHERE id=42
// Get the URI
// SELECT uri FROM downloads WHERE id=42
// Redirect to the URI
// (You may also need to set Content-type header for file downloads)
header( "Location: $uri" );
You may optimize this by passing the uri as a second parameter so that you won't need to fetch it at redirect time.
Click here to download
Another way of collecting this kind of statistics is to use some javascript tools provided by your statistics provider, like Google Analytics or Piwik, adding a listener to the click event.
It is less invasive for your base code but won't let you easily reuse collected data in your site (for example if you want to show a "top download" list).
Create a file with download script for example download.php and route all your downloads through it. Update your counter in this page and use appropriate headers for download.
eg:
url may be download.php?id=1&file=yourfile
in download.php
//get id and file
//database operation to update your count
//headers for download
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am searching for a way to download Youtube videos using PHP. I have searched how to do this for hours but unfortunately all the Google results I find are years old and do not work anymore.
I would appreciate it if someone could explain how to do this, or give a link to an up-to-date article that explains it in detail.
Thanks very much.
The first thing you should do is get a tool like Fiddler and visit a YouTube video page. In Fiddler, you will see all of the files that make up that page, including the FLV itself. Now, you know that the video isn't one of the CSS files, nor is it the image files. You can ignore those. Look for a big file. If you look at the URL, it begins with /videoplayback.
Now, once you've found it, figure out how the browser knew to get that file. Do a search through the sessions (Ctrl+F) and look for "videoplayback". You will see a hit on the first page you went to, like http://www.youtube.com/watch?v=123asdf. If you dig through that file, you'll see a DIV tag with the ID of "watch-player". Within that there is a script tag to setup the flash player, and within that are all of the flash parameters. Within those is the URL to the video.
So now you know how to use your tools to figure out how the browser got to it. How do you duplicate this behavior in PHP?
Do a file_get_contents() on the page that references the video. Ignore everything not in that watch-player div. Parse through the code until you find that variable that contains the URL. From there you will probably have to unescape that URL. Once you have it, you can do a file_get_contents() (or some other download method, depending on what you are trying to do) to get the URL. it is that simple. Your HTML parsing code will be the most complex.
Finally, keep in mind what you are about to do may be illegal. Check the EULA.
Nobody writes manuals/howtos that become outdated every four weeks. The closest you can get is inspecting the actual extraction methods in a contemporary implementation. Quite readable:
http://bitbucket.org/rg3/youtube-dl/raw/2010.08.04/youtube-dl
If you don't want to read through/reimplement it, it's obviously not simple, you could just run it as-is from PHP:
system("youtube-dl '$url'");
last time i was working on fixing one of the brocken chrome extension to download youtube video. I fixed it by altering the script part. (Javascript)
var links = new String();
var downlink = new String();
var has22 = new Boolean();
has22 = false;
var Marked=false;
var FMT_DATA = fmt_url_map;//This is html text that you have to grab. In case of extension it was readily available through:document.getElementsByTagName('script');
var StrSplitter1='%2C', StrSplitter2='%26', StrSplitter3='%3D';
if (FMT_DATA.indexOf(',')>-1) { //Found ,
StrSplitter1=',';
StrSplitter2=(FMT_DATA.indexOf('&')>-1)?'&':'\\u0026';
StrSplitter3='=';
}
var videoURL=new Array();
var FMT_DATA_PACKET=new Array();
var FMT_DATA_PACKET=FMT_DATA.split(StrSplitter1);
for (var i=0;i<FMT_DATA_PACKET.length;i++){
var FMT_DATA_FRAME=FMT_DATA_PACKET[i].split(StrSplitter2);
var FMT_DATA_DUEO=new Array();
for (var j=0;j<FMT_DATA_FRAME.length;j++){
var pair=FMT_DATA_FRAME[j].split(StrSplitter3);
if (pair.length==2) {
FMT_DATA_DUEO[pair[0]]=pair[1];
}
}
var url=(FMT_DATA_DUEO['url'])?FMT_DATA_DUEO['url']:null;
if (url==null) continue;
url=unescape(unescape(url)).replace(/\\\//g,'/').replace(/\\u0026/g,'&');
var itag=(FMT_DATA_DUEO['itag'])?FMT_DATA_DUEO['itag']:null;
var itag=(FMT_DATA_DUEO['itag'])?FMT_DATA_DUEO['itag']:null;
if (itag==null) continue;
var signature=(FMT_DATA_DUEO['sig'])?FMT_DATA_DUEO['sig']:null;
if (signature!=null) {
url=url+"&signature="+signature;
}
if (url.toLowerCase().indexOf('http')==0) { // validate URL
if (itag == '5') {
links += '<span class="yt-uix-button-menu-item" id="v240p">FLV (240p)</span>';
}
if (itag == '18') {
links += '<span class="yt-uix-button-menu-item" id="v360p">MP4 (360p)</span>';
}
if (itag == '35') {
links += '<span class="yt-uix-button-menu-item" id="v480p">FLV (480p)</span>';
}
if (itag == '22') {
links += '<span class="yt-uix-button-menu-item" id="v720p">MP4 HD (720p)</span>';
}
if (itag == '37') {
links += ' <span class="yt-uix-button-menu-item" id="v1080p">MP4 HD (1080p)</span>';
}
if (itag == '38') {
links += '<span class="yt-uix-button-menu-item" id="v4k">MP4 HD (4K)</span>';
}
FavVideo();
videoURL[itag]=url;
console.log(itag);
}
}
You can get separate video link from videoURL[itag] array.
Above logic can be converted to PHP easily
The extension can be downloaded from location http://www.figmentsol.com/chrome/ytdw/
I hope this would help someone. This is working solution (date:06-04-2013)