parsing multiple urls from the form - php

I'm trying to make a script, which search the list of urls given in the form for the email adresses. Could anyone advice me how to do it? Is there some alternative to cURL?
I tried to make it with file_get_contents, but the script analyze only the last url given in the form: when I enter for example two urls to the form, the first "print_r("show current_url:". $current_url); is empty and for the second it shows the page(url) content(without pictures).
I asked on different forums, but received no answer. Will really appraciate your help.
Thank you
$urls = explode("\n", $_POST['urls']);
$db = new mysqli('localhost', 'root', 'root', 'urls');
if (mysqli_connect_errno()) {
echo 'Błąd: ';
exit;
}
for ($i=0; $i<count($urls); $i++){
print_r("show link:". $urls[$i]."<br>");
$current_url = file_get_contents($urls[$i]);
print_r("show current_url:". $current_url);
preg_match( "/[\._a-zA-Z0-9-]+#[\._a-zA-Z0-9-]+/i", $current_url, $email);//email
print_r ("show email:".$email[0]);
$query = "INSERT INTO urle set adres = '$email[0]' ";
$result = $db->query($query);
}
if ($query) {
echo $db->affected_rows ."pozycji dodano.";
} else {
echo mysql_errno() . ":" . mysql_error() . "Wystąpił błąd przy dodawaniu urli ";
}
$db->close();
?>
EDIT:
I have tried with curl. var_dump($email); shows: array(0) { }
The script displays now all of the urls given in the form in the browser, but preg_match doesn't work, so it doesn't extract email adresses.
<?php
$urls = explode("\n", $_POST['urls']);
$db = new mysqli('localhost', 'root', 'root', 'linki');
if (mysqli_connect_errno()) {
echo 'Błąd: ';
exit;
}
for ($i=0; $i<count($urls); $i++){
$url = $urls[$i];
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_URL, $url);
$output = curl_exec($ch);
preg_match( "/[\._a-zA-Z0-9-]+#[\._a-zA-Z0-9-]+/i", $output, $email);//email
var_dump($email);
$query = "INSERT INTO urle set adres = '$email[0]' ";
$result = $db->query($query);
curl_close($ch);
}//
if ($result) {
echo $db->affected_rows ."pozycji dodano.";
} else {
echo mysql_errno() . ":" . mysql_error() . "Wystąpił błąd przy dodawaniu urli ";
}
$db->close();
?>

Is there some alternative to cURL?
file_get_contents, which doesn't give you any error messages (unless error_reporting is raised), and which is often blocked unless ini_set("user_agent", ...) was set.
Alternatively HttpRequest on newer PHP installations.
Still curl is not difficult to use. The manual is full of examples.
the first "print_r("show current_url:". $current_url); is empty
Nobody can tell. It's your duty to debug that (especially since you haven't mentioned the affected url in your question). Use curl or httprequest.

Ok, i've fixed it!!!:)
Here is the code:
for ($i=0; $i<count($linki); $i++){
$url = $linki[$i];
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result =curl_exec($ch);
curl_close($ch);
preg_match("/[-a-z0-9\._]+#[-a-z0-9\._]+\.[a-z]{2,4}/", $result, $email);//email
print_r($email);
$zapytanie = "INSERT INTO urle set adres = '$email[0]' ";
$wynik = $db->query($zapytanie);
}

Related

unlink not working when php script is ran by cron

I have a php script that is checking the result of a virustotal scan. If the scan returns positive for malicious code it changes the value to 0 in the db. I have another php script which checks the value and if it is 0 it removes the entry from the db and then removes the file from the directory. When I run this through the command line it works perfectly, however when cron runs it, it does remove the db entry as it should however it does not delete the file from the directory.
Any help would be much appreciated.
Here is the end of the php file with the unlink:
else{
// if not it deletes the image
$hash = $row['hash'];
$result2 = mysqli_query($connection, "DELETE FROM userUploads WHERE hash = '$hash' ");
// need due to dir structure
$pat = str_replace('../','', $row['fileName']);
unlink ($pat);
if (! $result2){
echo('Database error: ' . mysqli_error($connection));
}
For reference, here is the full file:
<?php
function VirusTotalCheckHashOfSubmitedFile($hash){
$debug = false;
$post = array('apikey' => '','resource'=>$hash);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.virustotal.com/vtapi/v2/file/report');
curl_setopt($ch, CURLOPT_POST,1);
curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate'); // please compress data
curl_setopt($ch, CURLOPT_USERAGENT, "gzip, My php curl client");
curl_setopt($ch, CURLOPT_VERBOSE, 1); // remove this if your not debugging
curl_setopt($ch, CURLOPT_RETURNTRANSFER ,true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
$result = curl_exec ($ch);
$status_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($status_code == 200) { // OK
$js = json_decode($result, true);
if($debug){echo "<pre>";}
if($debug){print_r($js);}
if($js["positives"]==0){
// clean
return true;
}else{
// malware
return false;
}
} else { // Error occured
print($result);
}
curl_close ($ch);
}
$connection = mysqli_connect("");
if (!$connection) {
trigger_error("Could not reach database!<br/>");
}
$db_selected = mysqli_select_db($connection, "seclog");
if (!$db_selected) {
trigger_error("Could not reach database!<br/>");
}
// Selecs images that have not been marked as clean by virus total
$result = mysqli_query($connection, "Select hash, fileName FROM userUploads WHERE scanResult = 0");
if (! $result){
echo('Database error: ' . mysqli_error($connection));
}
while ($row = mysqli_fetch_assoc($result)) {
// checks for results on scanned images
if(VirusTotalCheckHashOfSubmitedFile($row['hash'])){
// if report returns image is malware free we update its virusFree attribute to true
$hash = $row['hash'];
$result2 = mysqli_query($connection, "UPDATE userUploads SET scanResult = 1 WHERE hash = '$hash'");
if (! $result2){
echo('Database error: ' . mysqli_error($connection));
}
}else{
// if not it deletes the image
$hash = $row['hash'];
$result2 = mysqli_query($connection, "DELETE FROM userUploads WHERE hash = '$hash' ");
// need due to dir structure
$pat = str_replace('../','', $row['fileName']);
unlink ($pat);
if (! $result2){
echo('Database error: ' . mysqli_error($connection));
}
}
}
?>
The problem almost certainly is the path $pat = str_replace('../','', $row['fileName']);. Crons execute PHP cli, that is not the same PHP that Apache executes, also is another context. Try setting absolute path:
$pat = "/var/www/myfolder/myscript.some";
If for some reason you need a variable because folder structure depends of the context (e.g. development, production) you could pass the variable as a parameter when the cron executes PHP:
//this is the cron
30 17 * * 1 myscript.php myvar
Inside myscript.php $argv[1] is myvar.

How to web scrape date from a website and store that date in database using php and mysql?

I have been looking around on the internet for the ways to get dates of an event and then store that date into the database, but I was not able to find much.
I was able to get the dates from the website, but I don't know how to store it.
I want to get dates only from the website and then store it in the format of Y-m-d. Please if you know any way to do this, tell me.
Link: https://www.brent.gov.uk/events-and-whats-on-calendar/?eventCat=Wembley+Stadium+events
<?php
$curl = curl_init();
$all_data = array();
$url = "https://www.brent.gov.uk/events-and-whats-on-calendar/?eventCat=Wembley+Stadium+events";
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($curl);
$event = array();
preg_match_all('/<h3 style="margin:0px!important;">(.*?)<\/h3>/si',$result,$match);
$event['title'] = $match[1];
print_r($event['title']);
echo $all_data;
?>
don't use regex to parse html, use a proper HTML parser, for example DOMDocument.
a quick inspection of the site reveals that all the dates are in h3 children of the only article element on the page, you can use that to identify them. after extracting their dates, you can use strtotime() to convert it to an unix timestamp, then you can use date() to convert it to to the Y-m-d format, eg
$result = curl_exec($curl);
$domd=#DOMDocument::loadHTML($result);
$dateElements=$domd->getElementsByTagName("article")->item(0)->getElementsByTagName("h3");
foreach($dateElements as $ele){
var_dump(date("Y-m-d",strtotime($ele->textContent)));
}
as for how to store the results in a mysql database, try writing php mysql tutorial -w3schools in google, or read the PDO section here: http://www.phptherightway.com/#pdo_extension
<?php
$db_host = "localhost";
$db_username = "username";
$db_pass = "password";
$db_name = "name";
// Run the actual connection here
$con = mysqli_connect($db_host, $db_username, $db_pass, $db_name);
if ($con->connect_errno) {
die("Failed to connect to MySQL: (" . $con->connect_errno . ") " . $con->connect_error);
}
$curl = curl_init();
//The Website you want to get data from
$url = "https://www.brent.gov.uk/events-and-whats-on-calendar/?eventCat=Wembley+Stadium+events";
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($curl);
libxml_use_internal_errors(true);
$domd=#DOMDocument::loadHTML($result);
//Getting the date from the site
$dateElements=$domd->getElementsByTagName("article")->item(0)->getElementsByTagName("h3");
foreach($dateElements as $ele){
$data = (date("Y-m-d",strtotime($ele->textContent)));
// echo "<br>".$data;
//checking if the date match with database date
$sql = "SELECT * FROM event_table WHERE date = '$data'";
$result = $con->query($sql);
if ($result->num_rows > 0) {
// output data of each row, if date match echo "Data is there";
while($row = $result->fetch_assoc()) {
echo "Data is there";
}
}
//if date is not there then inster it into the database
else {
$results = mysqli_query($con, "INSERT INTO event_table (id, date) VALUES ('',' $data')");
echo "data uploaded";
}
}
?>

Google Api Search Result is not working

To get the search result from google, I used those code. Sometimes it work perfectly but sometimes it don't give any answer. Now don't know what is the problem. I need those result for research purpose so I had to browse for different query
if (isset($_GET['content'])) {
// echo $_GET['content'];
$url_all=NULL;
$visibleurl=NULL;
$title_all=NULL;
$content_all=NULL;
$mainstring=NULL;
$searchTerm=$_GET['content'];
$endpoint = 'web';
$key= 'angelic-bazaar-111103';
$url = "http://ajax.googleapis.com/ajax/services/search/".$endpoint;
$args['q'] = $searchTerm;
$args['v'] = '1.0';
$url .= '?'.http_build_query($args, '', '&');
$url .="&rsz=". 8;
$ch = curl_init()or die("Cannot init");
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$body = curl_exec($ch)or die("cannot execute");
curl_close($ch);
$json = json_decode($body);
for($x=0;$x<count($json->responseData->results);$x++){
$url_all .="#$*##" . $json->responseData->results[$x]->url;
$visibleurl .="#$*##" . $json->responseData->results[$x]->visibleUrl;
$title_all .="#$*##" . $json->responseData->results[$x]->title;
$content_all .="#$*##" . $json->responseData->results[$x]->content;
}
**EDIT
This Code works well sometimes, other times it doesn't, Is it a problem of google or something else. I get this error
$json->responseData->results for this showing "Trying to get property of non-object in"

How to check if link is downloadable file in php? [duplicate]

This question already has answers here:
What is the best way to check if a URL exists in PHP?
(5 answers)
Closed 7 years ago.
I'm trying to make broken link checker with php.
I modified some php code i found online i'm not php programmer.
It let's in some unbroken link's but thats ok.
However I have problem with all presentation, zips and so on...
Basicly if it is downlaod then algorithm thinks it's a dead link.
<?php
set_time_limit(0);
//ini_set('memory_limit','512M');
$servername = "localhost";
$username = "";
$password = "";
try {
$conn = new PDO("mysql:host=$servername;dbname=test", $username, $password);
// set the PDO error mode to exception
$conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
echo "Connected successfully" . "<br />";
echo "----------------------------------------------------<br />";
}
catch (PDOException $e) {
echo "Connection failed: " . $e->getMessage();
}
$sql = "SELECT object,value FROM metadata where xpath = 'lom/technical/location'";
$result = $conn->query($sql)->fetchAll(PDO::FETCH_ASSOC);
//print_r($result);
$array_length = sizeof($result); //26373
//$array_length = 26373;
$i = 0;
$myfile = fopen("Lom_Link_patikra1.csv", "w") or die("Unable to open file!");
$menu_juosta = "Objektas;Nuoroda;Klaidos kodas;\n";
//fwrite($myfile,$menu_juosta);
for ($i; $i < $array_length; $i++) {
$new_id = $result[$i]["object"];
$sql1 = "SELECT published from objects where id ='$new_id'";
$result_published = $conn->query($sql1)->fetchAll(PDO::FETCH_ASSOC);
//print_r ($result_published);
if ($result_published[0]["published"] != 0) {
$var1 = $result[$i]["value"];
$var1 = str_replace('|experience|902', '', $var1);
$var1 = str_replace('|packed_in|897', '', $var1);
$var1 = str_replace('|packed_in|911', '', $var1);
$var1 = str_replace('|packed_in|895', '', $var1);
$request_response = check_url($var1); // Puslapio atsakymas
if ($request_response != 200) {
$my_object = $result[$i]["object"] . ";" . $var1 . ";" . $request_response . ";\n";
fwrite($myfile, $my_object);
}
}
}
fclose($myfile);
$conn = null;
function check_url($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
$headers = curl_getinfo($ch);
curl_close($ch);
return $headers['http_code'];
}
Link example : http://lom.emokykla.lt/MO/Matematika/pazintis_su_erdviniais%20_kunais_1.doc
Any solutions, advice?
Thank you all for help.Now it works way faster. It seems there is problem with blank spaces, but that's even intriguing.
As it seems the problem i had was in understanding, how http status is working, like what it return's and why. Link's that i had marked as bad,but working where 301 or 302 - Redirect's.
https://en.wikipedia.org/wiki/List_of_HTTP_status_codes
Thank you all for help.
Using CURL for remote file
function checkRemoteFile($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
// don't download content
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
if(curl_exec($ch)!==FALSE)
{
return true;
}
else
{
return false;
}
}
EDIT: I may have misunderstood you but if you just want to check if the url actually exists than the code below will be all you need.
function url_exists($url) {
if(#file_get_contents($url,0,NULL,0,1))
{return 1;}
else
{return 0;}
}
curlopt_nobody set to TRUE makes a HTTP HEAD request instead of a GET request, so try using curl_setopt( $ch, CURLOPT_NOBODY, true );
Try to use file_exists method : http://php.net/manual/fr/function.file-exists.php

Execute multiple url

I have a little problem. I need to execute an script that execute 5000 URL in php.
$con = mysql_connect("localhost","user","psswd");
if (!$con) {
die('Could not connect: ' . mysql_error());
}
mysql_select_db('db_name', $con);
print "connected";
$result = mysql_query ("SELECT name, uid FROM obinndocusers");
// I need to execute that url for each user
while ($row = mysql_fetch_array($result)) {
header (Location http:xxxxxxxx?q=user/" . $row['uid'] . "/edit&destination=admin/user/user);
}
Any idea??
Thx.
use CURL
LIKE :
$ch = curl_init();
while ($row = mysql_fetch_array($result)) {
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com?q=user/" . $row['uid'] . "/edit&destination=admin/user/user");
curl_setopt($ch, CURLOPT_HEADER, 0);
// grab URL and pass it to the browser
curl_exec($ch);
}
// close cURL resource, and free up system resources
curl_close($ch);
Use cURL
<?php
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http:xxxxxxxx?q=user/" . $row['uid'] . "/edit&destination=admin/user/user");
curl_setopt($ch, CURLOPT_HEADER, 0);
// grab URL and pass it to the browser
curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
?>
First thing: header() is a primitive to send http headers to browser. It must be called before any stdout output (like 'print' or 'echo').
Second thing: "Location: " header will tell your browser to redirect to that URL. You can not specify more that one URL.
If you need your script to do http queries, use curl or fopen, and do not call your script from your browser.
The best way would be with CURL (see other answer by Haim Evgi), but if the server doesn't have the curl extension, then this will also work.
<?
$con = mysql_connect("localhost","user","psswd");
if (!$con) {
die('Could not connect: ' . mysql_error());
}
mysql_select_db('db_name', $con);
print "connected";
$result = mysql_query ("SELECT name, uid FROM obinndocusers");
// I need to execute that url for each user
while ($row = mysql_fetch_array($result)) {
file_get_contents("http:xxxxxxxx?q=user/" . $row['uid'] . "/edit&destination=admin/user/user");
}

Categories