Count Hyperlinks of a Website [duplicate] - php

This question already exists:
Closed 11 years ago.
Possible Duplicate:
How to parse HTML with PHP?
i want to write a php-program that count all hyperlinks of a website, the user can enter.
how to do this? is there a libary or something which i can parse and analyze the html about the hyperlinks?
thanks for your help

Like this
<?php
$site = file_get_contents("someurl");
$links = substr_count($site, "<a href=");
print"There is {$links} in that page.";
?>

Well, we won't be able to give you a finite answer but only pointers. I've done a search engine once out of php so the principle will be the same:
First of all you need to code your script as a console script, a web script is not really appropriate but it's all a question of tastes
You need to understand how to work with sockets in PHP and make requests, look at the php socket library at: http://www.php.net/manual/ref.network.php
You will need to get versed in the world of HTTP requests, learn how to make your own GET/POST requests and split the headers from the returned content.
Last part will be easy with regexp, just preg_match the content for "#()*#i" (the last expression might be wrong, i didn't test it at all ok?)
Loop the list of found hrefs, compare to already visited hrefs (remember to take into account wildcard GET params in your stuff) and then repeat the process to load all the pages of a site.
It IS HARD WORK... good luck

You may have to use CURL to fetech the contents of the webpage. Store that in a variable then parse it for hyperlinks. You might need regular expression for that.

Related

Taking information from one site and using it on my analyser/script made with PHP

I need to extract the "Toner Cartridges" levels from This site and "send it" to the one im working on. Im guessing i can use GET or something similar but im new to this so i dont know how it could be done.
Then the information needs to be run through a if/else sequence which checks for 4 possible states. 100% -> 50%, 50%->25%, 25%->5%, 5%->0%.
I have the if/else written down but i cant seem to find any good command for extracting the infromation from the index.php file.
EDIT: just need someone to poin me in the right the direction
To read the page you can use file_get_contents
$page = file_get_contents("http://example.com");
But in order to make the function work with URLs, allow_url_fopen must be set to true in the config file of your php (php.ini)
Then you can use a regular expression to filter the text and get data.
The php function to perform a regular expression is preg_match
Example:
preg_match('/[^.]+\.[^.]+$/', $host, $matches);
echo "domain name is: {$matches[0]}\n";
will output
domain name is: php.net
I imagine you are reading from a Printer Status Page. In which case, to give your self the flexibility to use sessions and login, I would look into Curl. Nice thing about Curl is, you can use the PHP library for code, but you can also test at the command-line rather quickly.
After you are retrieving the HTML contents, I would look into using an XML parser, like SimpleXML or DOMDocument. Either one will get you to the information you need. SimpleXML is a little easier to use for people new to traversing XML (this is, at the same time, like and very not like jQuery).
Although, that said, you could also hack to the data just as quick (if you are just now jumping in) with Regulair Expressions (it is seriously like that once you get the hang of it).
Best of luck!

Php copy website table [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
HTML Scraping in Php
Far from being web developer expert, so sorry in advance if I'm missing something basic:
I need to copy a table into mySql database using PHP; the table resides in a website which I don't own, however I have permission to copy and publish.
Manually when I watch this website in my web-browser I need to click on a link in the main website URL (I can't reach the final destination page link since it changes all time, however the main page link is static and the link to click is also static).
Example to such a content I need to copy from (just an example, this is not the real content):
http://www.flightstats.com/go/FlightStatus/flightStatusByAirport.do?airportCode=JFK&airportQueryType=0
Most people are going to ask what have you tried. Since you mentioned that you don't have much development experience, here are some tips on how to go about it - have to put it as an answer so it is easier to read.
What you're going to need to do is scraping.
Using PHP, you'd use the following functions at the very least
file_get_contents() - this function will read the data in the URL
preg_match_all - use of regular expressions will let you get the data you are looking for. Though some/many people will say that you should go through the DOM.
The data that is returned with preg_match_all can be stored into your MySQL table. Though because the data changes so frequently, you might be better off just scraping that section and storing the entire table as cache (though I do have to say I have no idea what you are trying to do on your site - so I could well be wrong).

What is the best way to create a sitemap.xml? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to create a sitemap using PHP & MySQL
So I'm kinda stuck here. I have a website with a pretty big database, that constantly changes. Now I want to help the search engines by supplying a sitemap.xml file. Normally I would use a webservice that would do this, but thats not really possible in this case.
To be honest. I have no clue where to start. How would I go about doing this? Sorry if this is a too basic question, but Google couldn't help me.
Edit: Some more info. DB is currently 1k pages. Want to go up to like 10k. I use Mysql to echo this from my database, and then htaccess to rewrite the URLs.
(PHP's get ID, etc)
You need to install a crawler of doing it like a webservice. The easier way is to write a php script and generate sitemap XML file by yourself.
Write a query to get the links from your database and then iterate over it to create a sitemap.
See this post for example php script How to create a sitemap using PHP & MySQL

append script to existing page

good day everyone!
i am trying to append a script to a remote page (not mine and it is a form page) that would hide some of its content (some in particular) before showing it. i am using curl but the only thing i could do is retrieve its html code.
is there anyway of doing what i wanted to happen?
I'm assuming that the user asks your server for content, and your server needs to fetch that content on another server and process it before sending it back to the user.
Query the other script using CURL, then run your script on that HTML to remove the pieces that you don't want to keep (I hope for your sake that they are reasonably easy to find and eliminate), and finally output the resulting HTML to the user.
To remove some part of the html, you could preg_replace() it using regular expressions.
Googling for an online regexp might be of some help, if you have no experience with regular expressions.

Grab content from another website [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to parse and process HTML/XML with PHP?
I want to grab anything thats inside the code below on a website.
<table class="eventResults byEvent vevent"></table>
How can I accomplish this?
Thanks
If you want to grab HTML from a site, you will want to use a DOM Parser. PHP has several XML processing packages to help you with this, be it DOM, SimpleXML or XMLReader. An often suggested alternative at SO is SimpleHtmlDom.
Since one of the class in the table is vevent, the content inside the table could be an hCalender microformat (can't tell for sure without seeing the content). If so, you can also use a microformat parser, preferably Transformr to save you the work of manually parsing the event data.
You can use the file_get_contents function or the curl extension for that.
First fetch the content of the whole page from that website as a string, then use regular expression to extract the substring inside table tags.

Categories