Loop Preg_match until no more matches

Loop Preg_match until no more matches - php

How can I preg_match until no more results is found?
I'm using curl to login a page and then delete posts from there.
But to delete those posts I need to preg_match the content and filter the IDs and if found ids there my script run the delete command.
So, basically:
$pattern = '/(?<=list_id=).*?(?=&cmd=edit)/s';
preg_match($pattern, $LoginResult, $id); //THIS PREG_MATCH IS WORKING, IT GETS THE FIRST RESULT OF THE PAGE (WHAT I NEED). BUT I NEED TO MAKE A LOOP TO THIS SCRIPT RUN OVER AND OVER UNTIL NOTHING MORE IS FOUND.
$idpagina = $id[0];
In words it should make something like:
If > preg_match is true > run delete command.
Loop If until preg_match is false.
With this code I can find everything there is between list_id= and &cmd=edit. If the script find something between this two strings, It needs to perform a curl to delete this ID:
//THIS IS WORKING
$paginadelete = "https://example/list/folder/0?list_id=".$idpagina."&cmd=delete&type=AD_DELETE";
curl_setopt($login, CURLOPT_URL, $paginadelete);
curl_setopt($login, CURLOPT_POST, 1);
curl_setopt($login, CURLOPT_FOLLOWLOCATION, 1);
$step1 = curl_exec($login);
echo $step1;
What this basically does is (or should do):
Loop preg_match and if preg_match is true go to #2
Run Delete Curl
Return to #1 until nothing is found in preg_match
But this script run 3 curl processes:
Login
Go to delete page (this one above)
Confirm delete
So this loop should be between step #2 and #3 until nothing more is found.
My #3 step (confirm delete) is this one:
$url = curl_getinfo($login, CURLINFO_EFFECTIVE_URL);
$url = parse_url($url, PHP_URL_PATH);
$url = substr($url, 9);
$url = "http://example.com/cmd/act/".$url;
$post_data = array(
'1' => 'delete',
'2' => '1',
'3' => '2',
'4' => '10',
'5' => '',
'6' => 'continue',
);
//traverse array and prepare data for posting (key1=value1)
foreach ( $post_data as $key => $value) {
$post_items[] = $key . '=' . $value;
}
//create the final string to be posted using implode()
$post_string = implode ('&', $post_items);
curl_setopt($login, CURLOPT_URL, $url);
curl_setopt($login, CURLOPT_POST, 1);
curl_setopt($login, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($login, CURLOPT_POSTFIELDS, $post_string);
$step2 = curl_exec($login);
//echo $step2;
////////////////////// EDIT
I was trying:
if (preg_match('/(?<=list_id=).*?(?=&cmd=edit)/s', $LoginResult, $id)){
}
else {
}
But this will only work for the first result. After that, the script stops. I need to re-run the if until preg_match is false and then end in the else.
I thought about using DO and WHILE, but I don't know how and neither if it'll work.
////////////////// EDIT 2
I'm now trying to use a GOTO until get false and close connection
verification:
if (preg_match('/(?<=list_id=).*?(?=&cmd=edit)/s', $LoginResult, $id)){
[..........]
} else {
//close the connection
curl_close($login);
}
goto verification;
But doesn't seem to work, lol.

Your question isn't clear, but I guess what you need is preg_match_all, i.e.:
preg_match_all('/((?<=list_id=).*?(?=&cmd=edit))/im', $html, $matches, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($matches[1]); $i++) {
//here you can implement an if/else to check if the ID exist
echo $matches[1][$i];
}
http://php.net/manual/en/function.preg-match-all.php

Based on your first edit, It seems that what you are trying to achieve is the following:
while(preg_match('/(?<=list_id=).*?(?=&cmd=edit)/s', $LoginResult, $id)) {
// do stuff
}
// do stuff after the preg_match is false
Edit:
Based on your description on the comments maybe this code will satisfy your needs.
while(true) {
$result = preg_match('/(?<=list_id=).*?(?=&cmd=edit)/s', $LoginResult, $id);
if($result) {
// Run Delete Curl
} else {
curl_close($login);
break;
}
}

Related

PHP API request by GET details

I'm trying to get the details from this example (i created the code right now).
But i'm very... confused... how can i get the details of the link, then separate and send to my MYSQL database..
<?php
$ch = curl_init();
$url = "https://reqres.in/api/users?page=2";
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$resp = curl_exec($ch);
if($e = curl_error($ch)) {
echo $e;
}
else {
$decoded = json_decode($resp, true);
//print_r($decoded);
foreach($decoded as $key => $item) {
$array = array(
'id' => ,
'email' => ,
'first_name' => ,
'last_name' => ,
);
print_r($array);
}
}
curl_close($ch);
?>

If you call the url in your browser then you will see that the result array is present in the data field.
You may check this by printing the whole result:
print_r($decoded);
So if you like to print_r the results it should be simply
print_r($decoded['data']);
If you like to store it in your database you may walk through the array and store each item
foreach($decoded['data'] as $item) {
storeItem($item);
}
To make this work you should implement the storeItem function which accepts the array $item and stores it into your database. There are various tutorials about doing that.

PHP file_get_contents error, wouldn't populate from an array?

I've been trying to write a simple script in PHP to pull off data from a ISBN database site. and for some reason I've had nothing but issues using the file_get_contents command.. I've managed to get something working for this now, but would just like to see if anyone knows why this wasn't working?
The below would not populate the $page with any information so the preg matches below failed to get any information. If anyone knows what the hell was stopping this would be great?
$links = array ('
http://www.isbndb.com/book/2009_cfa_exam_level_2_schweser_practice_exams_volume_2','
http://www.isbndb.com/book/uniform_investment_adviser_law_exam_series_65','
http://www.isbndb.com/book/waterworks_a02','
http://www.isbndb.com/book/winning_the_toughest_customer_the_essential_guide_to_selling','
http://www.isbndb.com/book/yale_daily_news_guide_to_fellowships_and_grants'
); // array of URLs
foreach ($links as $link)
{
$page = file_get_contents($link);
#print $page;
preg_match("#<h1 itemprop='name'>(.*?)</h1>#is",$page,$title);
preg_match("#<a itemprop='publisher' href='http://isbndb.com/publisher/(.*?)'>(.*?)</a>#is",$page,$publisher);
preg_match("#<span>ISBN10: <span itemprop='isbn'>(.*?)</span>#is",$page,$isbn10);
preg_match("#<span>ISBN13: <span itemprop='isbn'>(.*?)</span>#is",$page,$isbn13);
echo '<tr>
<td>'.$title[1].'</td>
<td>'.$publisher[2].'</td>
<td>'.$isbn10[1].'</td>
<td>'.$isbn13[1].'</td>
</tr>';
#exit();
}

My guess is you have wrong (not direct) URLs. Proper ones should be without the www. part - if you fire any of them and inspect the returned headers, you'll see that you're redirected (HTTP 301) to another URL.
The best way to do it in my opinion is to use cURL among curl_setopt with options CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS.
Of course you should trim your urls beforehands just to be sure it's not the problem.
Example here:
$curl = curl_init();
foreach ($links as $link) {
curl_setopt($curl, CURLOPT_URL, $link);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($curl, CURLOPT_MAXREDIRS, 5); // max 5 redirects
$result = curl_exec($curl);
if (! $result) {
continue; // if $result is empty or false - ignore and continue;
}
// do what you need to do here
}
curl_close($curl);

Preg_match_all not stopping where it should be

Update Yahoo error
Ok, so I got it all working, but the preg_match_all wont work towards Yahoo.
If you take a look at:
http://se.search.yahoo.com/search?p=random&toggle=1&cop=mss&ei=UTF-8&fr=yfp-t
then you can see that in their html, they have
<span class="url" id="something random"> the actual link </span>
But when I try to preg_match_all, I wont get any result.
preg_match_all('#<span class="url" id="(.*)">(.+?)</span>#si', $urlContents[2], $yahoo);
Anyone got an idea?
End of update
I'm trying to preg_match_all the results i get from Google using a cURL curl_multi_getcontent method.
I have succeeded to fetch the site and so, but when I'm trying to get the result of the links, it just takes too much.
I'm currently using:
preg_match_all('#<cite>(.+)</cite>#si', $urlContents[0], $links);
And that starts where it should be, but it doesn't stop, it just keeps going.
Check the HTML at www.google.com/search?q=random for example and you will see that all links start with and ends with .
Could someone possible help me with how I should retreive this information?
I only need the actual link address to each result.
Update Entire PHP Script
public function multiSearch($question)
{
$sites['google'] = "http://www.google.com/search?q={$question}&gl=sv";
$sites['bing'] = "http://www.bing.com/search?q={$question}";
$sites['yahoo'] = "http://se.search.yahoo.com/search?p={$question}";
$urlHandler = array();
foreach($sites as $site)
{
$handler = curl_init();
curl_setopt($handler, CURLOPT_URL, $site);
curl_setopt($handler, CURLOPT_HEADER, 0);
curl_setopt($handler, CURLOPT_RETURNTRANSFER, 1);
array_push($urlHandler, $handler);
}
$multiHandler = curl_multi_init();
foreach($urlHandler as $key => $url)
{
curl_multi_add_handle($multiHandler, $url);
}
$running = null;
do
{
curl_multi_exec($multiHandler, $running);
}
while($running > 0);
$urlContents = array();
foreach($urlHandler as $key => $url)
{
$urlContents[$key] = curl_multi_getcontent($url);
}
foreach($urlHandler as $key => $url)
{
curl_multi_remove_handle($multiHandler, $url);
}
foreach($urlContents as $urlContent)
{
preg_match_all('/<li class="g">(.*?)<\/li>/si', $urlContent, $matches);
//$this->view_data['results'][] = "Random";
}
preg_match_all('#<div id="search"(.*)</ol></div>#i', $urlContents[0], $match);
preg_match_all('#<cite>(.+)</cite>#si', $urlContents[0], $links);
var_dump($links);
}

run the regular expression in U-ngready mode
preg_match_all('#<cite>(.+)</cite>#siU

As in Darhazer's answer you can turn on ungreedy mode for the whole regex using the U pattern modifier, or just make the pattern itself ungreedy (or lazy) by following it with a ?:
preg_match_all('#<cite>(.+?)</cite>#si', ...

writing cURL like function in a rails app

I'm trying to convert this PHP cURL function to work with my rails app. The piece of code is from an SMS payment gateway that needs to verify the POST paramters. Since I'm a big PHP noob I have no idea how to handle this problem.
$verify_url = 'http://smsgatewayadress';
$fields = '';
$d = array(
'merchant_ID' => $_POST['merchant_ID'],
'local_ID' => $_POST['local_ID'],
'total' => $_POST['total'],
'ipn_verify' => $_POST['ipn_verify'],
'timeout' => 10,
);
foreach ($d as $k => $v)
{
$fields .= $k . "=" . urlencode($v) . "&";
}
$fields = substr($fields, 0, strlen($fields)-1);
$ch = curl_init($verify_url); //this initiates a HTTP connection to $verify_url, the connection headers will be stored in $ch
curl_setopt($ch, CURLOPT_POST, 1); //sets the delivery method as POST
curl_setopt($ch, CURLOPT_POSTFIELDS, $fields); //The data that is being sent via POST. From what I can see the cURL lib sends them as a string that is built in the foreach loop above
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); //This verifies if the target url sends a redirect header and if it does cURL follows that link
curl_setopt($ch, CURLOPT_HEADER, 0); //This ignores the headers from the answer
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //This specifies that the curl_exec function below must return the result to the accesed URL
$result = curl_exec($ch); //It ransfers the data via POST to the URL, it gets read and returns the result
if ($result == true)
{
//confirmed
$can_download = true;
}
else
{
//failed
$can_download = false;
}
}
if (strpos($_SERVER['REQUEST_URI'], 'ipn.php'))
echo $can_download ? '1' : '0'; //we tell the sms sever that we processed the request
I've googled a cURL lib counterpart in Rails and found a ton of options but none that I could understand and use in the same way this script does.
If anyone could give me a hand with converting this script from php to ruby it would be greatly appreciated.

The most direct approach might be to use the Ruby curb library, which is the most straightforward wrapper for cURL. A lot of the options in Curl::Easy map directly to what you have here. A basis might be:
url = "http://smsgatewayadress/"
Curl::Easy.http_post(url,
Curl::PostField.content('merchant_ID', params[:merchant_ID]),
# ...
)

PHP: Check if URL redirects?

I have implemented a function that runs on each page that I want to restrict from non-logged in users. The function automatically redirects the visitor to the login page in the case of he or she is not logged in.
I would like to make a PHP function that is run from a exernal server and iterates through a number of set URLs (array with URLs that is for each protected site) to see if they are redirected or not. Thereby I could easily make sure if protection is up and running on every page.
How could this be done?
Thanks.

$urls = array(
'http://www.apple.com/imac',
'http://www.google.com/'
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
foreach($urls as $url) {
curl_setopt($ch, CURLOPT_URL, $url);
$out = curl_exec($ch);
// line endings is the wonkiest piece of this whole thing
$out = str_replace("\r", "", $out);
// only look at the headers
$headers_end = strpos($out, "\n\n");
if( $headers_end !== false ) {
$out = substr($out, 0, $headers_end);
}
$headers = explode("\n", $out);
foreach($headers as $header) {
if( substr($header, 0, 10) == "Location: " ) {
$target = substr($header, 10);
echo "[$url] redirects to [$target]<br>";
continue 2;
}
}
echo "[$url] does not redirect<br>";
}

I use curl and only take headers, after I compare my url and url from header curl:
$url="http://google.com";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_TIMEOUT, '60'); // in seconds
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$res = curl_exec($ch);
if(curl_getinfo($ch)['url'] == $url){
echo "not redirect";
}else {
echo "redirect";
}

You could always try adding:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
since 302 means it moved, allow the curl call to follow it and return whatever the moved url returns.

Getting the headers with get_headers() and checking if Location is set is much simpler.
$urls = [
"https://example-1.com",
"https://example-2.com"
];
foreach ($urls as $key => $url) {
$is_redirect = does_url_redirect($url) ? 'yes' : 'no';
echo $url . ' is redirected: ' . $is_redirect . PHP_EOL;
}
function does_url_redirect($url){
$headers = get_headers($url, 1);
if (!empty($headers['Location'])) {
return true;
} else {
return false;
}
}

I'm not sure whether this really makes sense as a security check.
If you are worried about files getting called directly without your "is the user logged in?" checks being run, you could do what many big PHP projects do: In the central include file (where the security check is being done) define a constant BOOTSTRAP_LOADED or whatever, and in every file, check for whether that constant is set.
Testing is great and security testing is even better, but I'm not sure what kind of flaw you are looking to uncover with this? To me, this idea feels like a waste of time that will not bring any real additional security.
Just make sure your script die() s after the header("Location:...") redirect. That is essential to stop additional content from being displayed after the header command (a missing die() wouldn't be caught by your idea by the way, as the redirect header would still be issued...)
If you really want to do this, you could also use a tool like wget and feed it a list of URLs. Have it fetch the results into a directory, and check (e.g. by looking at the file sizes that should be identical) whether every page contains the login dialog. Just to add another option...

Do you want to check the HTTP code to see if it's a redirect?
$params = array('http' => array(
'method' => 'HEAD',
'ignore_errors' => true
));
$context = stream_context_create($params);
foreach(array('http://google.com', 'http://stackoverflow.com') as $url) {
$fp = fopen($url, 'rb', false, $context);
$result = stream_get_contents($fp);
if ($result === false) {
throw new Exception("Could not read data from {$url}");
} else if (! strstr($http_response_header[0], '301')) {
// Do something here
}
}

I hope it will help you:
function checkRedirect($url)
{
$headers = get_headers($url);
if ($headers) {
if (isset($headers[0])) {
if ($headers[0] == 'HTTP/1.1 302 Found') {
//this is the URL where it's redirecting
return str_replace("Location: ", "", $headers[9]);
}
}
}
return false;
}
$isRedirect = checkRedirect($url);
if(!$isRedirect )
{
echo "URL Not Redirected";
}else{
echo "URL Redirected to: ".$isRedirect;
}

You can use session,if the session array is not set ,the url redirected to a login page.
.

I modified Adam Backstrom answer and implemented chiborg suggestion. (Download only HEAD). It have one thing more: It will check if redirection is in a page of the same server or is out. Example: terra.com.br redirects to terra.com.br/portal. PHP will considerate it like redirect, and it is correct. But i only wanted to list that url that redirect to another URL. My English is not good, so, if someone found something really difficult to understand and can edit this, you're welcome.
function RedirectURL() {
$urls = array('http://www.terra.com.br/','http://www.areiaebrita.com.br/');
foreach ($urls as $url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// chiborg suggestion
curl_setopt($ch, CURLOPT_NOBODY, true);
// ================================
// READ URL
// ================================
curl_setopt($ch, CURLOPT_URL, $url);
$out = curl_exec($ch);
// line endings is the wonkiest piece of this whole thing
$out = str_replace("\r", "", $out);
echo $out;
$headers = explode("\n", $out);
foreach($headers as $header) {
if(substr(strtolower($header), 0, 9) == "location:") {
// read URL to check if redirect to somepage on the server or another one.
// terra.com.br redirect to terra.com.br/portal. it is valid.
// but areiaebrita.com.br redirect to bwnet.com.br, and this is invalid.
// what we want is to check if the address continues being terra.com.br or changes. if changes, prints on page.
// if contains http, we will check if changes url or not.
// some servers, to redirect to a folder available on it, redirect only citting the folder. Example: net11.com.br redirect only to /heiden
// only execute if have http on location
if ( strpos(strtolower($header), "http") !== false) {
$address = explode("/", $header);
print_r($address);
// $address['0'] = http
// $address['1'] =
// $address['2'] = www.terra.com.br
// $address['3'] = portal
echo "url (address from array) = " . $url . "<br>";
echo "address[2] = " . $address['2'] . "<br><br>";
// url: terra.com.br
// address['2'] = www.terra.com.br
// check if string terra.com.br is still available in www.terra.com.br. It indicates that server did not redirect to some page away from here.
if(strpos(strtolower($address['2']), strtolower($url)) !== false) {
echo "URL NOT REDIRECT";
} else {
// not the same. (areiaebrita)
echo "SORRY, URL REDIRECT WAS FOUND: " . $url;
}
}
}
}
}
}

function unshorten_url($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL, $url);
$out = curl_exec($ch);
$real_url = $url;//default.. (if no redirect)
if (preg_match("/location: (.*)/i", $out, $redirect))
$real_url = $redirect[1];
if (strstr($real_url, "bit.ly"))//the redirect is another shortened url
$real_url = unshorten_url($real_url);
return $real_url;
}

I have just made a function that checks if a URL exists or not
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
function url_exists($url, $ch) {
curl_setopt($ch, CURLOPT_URL, $url);
$out = curl_exec($ch);
// line endings is the wonkiest piece of this whole thing
$out = str_replace("\r", "", $out);
// only look at the headers
$headers_end = strpos($out, "\n\n");
if( $headers_end !== false ) {
$out = substr($out, 0, $headers_end);
}
//echo $out."====<br>";
$headers = explode("\n", $out);
//echo "<pre>";
//print_r($headers);
foreach($headers as $header) {
//echo $header."---<br>";
if( strpos($header, 'HTTP/1.1 200 OK') !== false ) {
return true;
break;
}
}
}
Now I have used an array of URLs to check if a URL exists as following:
$my_url_array = array('http://howtocode.pk/result', 'http://google.com/jobssss', 'https://howtocode.pk/javascript-tutorial/', 'https://www.google.com/');
for($j = 0; $j < count($my_url_array); $j++){
if(url_exists($my_url_array[$j], $ch)){
echo 'This URL "'.$my_url_array[$j].'" exists. <br>';
}
}

I can't understand your question.
You have an array with URLs and you want to know if user is from one of the listed URLs?
If I'm right in understanding your quest:
$urls = array('http://url1.com','http://url2.ru','http://url3.org');
if(in_array($_SERVER['HTTP_REFERER'],$urls))
{
echo 'FROM ARRAY';
} else {
echo 'NOT FROM ARR';
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Loop Preg_match until no more matches - php

Related

PHP API request by GET details

PHP file_get_contents error, wouldn't populate from an array?

Preg_match_all not stopping where it should be

writing cURL like function in a rails app

PHP: Check if URL redirects?

Categories

Resources