Python Regular Expressions in PHP - php

What would be the PHP equivalent to this code, I have tried curl but am unable to get it to work
import urllib2,urllib,re
url=' Delete me'
req = urllib2.Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3 Gecko/2008092417 Firefox/3.0.3')
response = urllib2.urlopen(req)
link=lemon.read()
response.close()
match=re.compile('Delete me 2').findall(link)
print match
Ok my code now looks like this
<?php
$url = "url";
$homepage = file_get_contents($url);
print $homepage
?>
in python I would now find the strings I need using something like this
match=re.compile('src="(.+?)" border="0" /></td>\n <td class="namewidth"><a title=".+?" href="(.+?)">(.+?)</a>').findall(link)
(.+?) being the unknown what is the equivalent to this in php?

Have you just tried file_get_contents()? It doesn't have the power of curl but if you just need to pull a URL it works.

Related

php file_get_contents($url) not returning same results as url in addressbar

when I copy and paste the following url in the address bar, it opens the page correctly:
https://www.lacourt.org/casesummary/ui/casesummary.aspx?CaseNumber=BC510457
but the following code returns a case not found message from the site when I run it on localhost:
<?php
$url = 'https://www.lacourt.org/casesummary/ui/casesummary.aspx?CaseNumber=BC510457';
echo file_get_contents($url);
?>
Why is file_get_contents not returning the same page as when I type the url directly in the address bar? Any suggestions?
Thank you.
first, i can't connect the url,
but you can try set user-agent in request header.
like this :)
`
$url = 'https://www.lacourt.org/casesummary/ui/casesummary.aspx?CaseNumber=BC510457';
$header = [
'header'=>[
'method'=>'GET',
'header'=>['user-agent'=>'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36']
]
];
$ctx = stream_context_create($header);
echo file_get_contents($url,false,$ctx);
`

Autoit download whole source of site

I am trying to download source-code of website. I made it work in autoit as well as in php....buuut the problem is that the source code is not entire. A HTML of a few items generated by some script werent downloaded.
I am working on my school project about probability in casino games(especially roulette).And I want to download these numbers:
NUMBERS
from page: http://csgocircle.com/ to create some statistics.
What do I do wrong ?
THANKS FOR YOUR HELP !
Autoit:
#include <Inet.au3>
#include <WinHttp.au3>
$url="http://csgocircle.com/"
$http_protocol = ObjCreate("winhttp.winhttprequest.5.1")
$http_protocol.setrequestheader("Content-Type", "application/x-www-form-urlencoded")
$http_protocol.setrequestheader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36")
$http_protocol.open("GET", $url)
$http_protocol.send($cookie)
$http_protocol.waitforresponse
$http_auth3 = $http_protocol.responsebody
ConsoleWrite($http_auth3)
Exit
Or PHP:
<?php
$url="http://csgocircle.com/";
$homepage = file_get_contents($url);
echo htmlspecialchars( $homepage );
In Autoit, you should load the URL in IE and get full html.
#include <IE.au3>
$url = "http://csgocircle.com/"
$oIE = _IECreate($url, 0, 0, 1, 0 )
;~ Sleep(2000) ; eventually do sleep in order to wait for JS/AJAX to finish the page
$html = _IEDocReadHTML($oIE)
_IEQuit($oIE)
ConsoleWrite($html)

File_get_html return empty html in PHP Simple HTML DOM Parser

I made a script that was getting content from another site using Simple HTML DOM Parser. It looked like this
include_once('simple_html_dom.php');
$html = file_get_html('http://csgolounge.com/'.$tradeid);
foreach($html->find('div[id=tradediv]') as $trade) {
$when = $trade->find('.tradeheader')[0];
}
I was probably looking for content too often (every 30 secs) , and now i get empty html back.
I tryed to change User agent like this
$context = stream_context_create();
stream_context_set_params($context, array('user_agent' => 'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6\r\n'));
$html = file_get_html('http://csgolounge.com/profile?id='.$steamid, 0, $context);
But am still getting back empty html.
The problem was that my html file was too big . Simple html dom has defined max file size define('MAX_FILE_SIZE', 600000). I changed it to 900000 and now its working again.

simple_html_dom ignores special characters

The code I am using is the one below, this works perfectly fine until I encounter url with Japanese character or any special characters. I have observed this issue and it seems that it is only returning the domain name whenever the url contains special characters such as japanese, as a result I kept getting random results which I don't intend to retrieve.
include_once 'simple_html_dom.php';
header('Content-Type: text/html; charset=utf-8');
$url_link = 'http://kissanime.com/Anime/Knights-of-Ramune-VS騎士ラムネ&40FRESH';
$html = file_get_html($url_link);
echo $html->find('.bigChar', 0)->innertext;
I should be getting a result of "Knights of Ramune" since that's the element I was trying to retrieve. Instead, the $url_link was redirected to domain name which is the 'http://kissanime.com/' without 'Anime/Knights-of-Ramune-VS騎士ラムネ&40FRESH'. And from there, it looks for the class with a value of '.bigChar' that results of giving random value.
The Real Problem domain is, how to retrieve the data using a URL with UTF-8 Characters, not simple_html_dom.
First of all, we need to encode the characters:
$url_link = 'http://kissanime.com/Anime/Knights-of-Ramune-VS騎士ラムネ&40FRESH';
$strPosLastPart = strrpos($url_link, '/') + 1;
$lastPart = substr($url_link, $strPosLastPart);
$encodedLastPart = rawurlencode($lastPart);
$url_link = str_replace($lastPart, $encodedLastPart, $url_link);
Normaly this should work. Since i have test it, it worked not. So I am asking why this error happens, and made a Call using CURL.
Object reference not set to an instance of an object. Description: An
unhandled exception occurred during the execution of the current web
request. Please review the stack trace for more information about the
error and where it originated in the code.
Exception Details: System.NullReferenceException: Object reference not
set to an instance of an object.
Now we know, this page is written in ASP.NET. But i was asking me, why it not work. I added a User Agent, and voila:
$ch = curl_init($url_link);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0');
$data = curl_exec($ch);
echo $data;
All together (working):
$url_link = 'http://kissanime.com/Anime/Knights-of-Ramune-VS騎士ラムネ&40FRESH';
//Encode Characters
$strPosLastPart = strrpos($url_link, '/') + 1;
$lastPart = substr($url_link, $strPosLastPart);
$encodedLastPart = rawurlencode($lastPart);
$url_link = str_replace($lastPart, $encodedLastPart, $url_link);
//Download Data
$ch = curl_init($url_link);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0');
$data = curl_exec($ch);
//Load Data into Html (untested, since i am not using this Lib)
$html = str_get_html($data);
Now the difference would be, to read $data into your simple_html_dom.php class, instead of file_get_html.
Cheers

PHP SOAP Client error

I'm getting this weird error "Value cannot be null. Parameter name: input" But there's no parameter with the name "input".
I've tried changing the code and playing around with it, but I think there's something simple that I'm missing here (haven't used SOAP for years).
<?php
$xmlData = '<LoanRequest><VendorId>20</VendorId>
<SubVendorId>0</SubVendorId>
<Tier>Dynamic</Tier>
<FirstName>TestFname</FirstName>
<LastName>TestLname</LastName>
<DateOfBirth>1979-03-09</DateOfBirth>
<Title>Mr</Title>
<Postcode>SO164LN</Postcode>
<HouseNumber>98</HouseNumber>
<Street>Test Street</Street>
<Town>Test Town</Town>
<County>Test County</County>
<HomeOwner>False</HomeOwner>
<HomePhone>02300000000</HomePhone>
<WorkPhone>02000000000</WorkPhone>
<MobilePhone>0799123321</MobilePhone>
<Email>pdbuktest#pbuk.com</Email>
<IncomeSource>5</IncomeSource>
<EmployerName>PDB Test</EmployerName>
<TimeWithEmployer>48</TimeWithEmployer>
<PaidByDirectDeposit>1</PaidByDirectDeposit>
<NetMonthlyIncome>1700</NetMonthlyIncome>
<PayFrequency>3</PayFrequency>
<NextPayDay>2013-05-31</NextPayDay>
<PaydayAfterNext>2013-06-07</PaydayAfterNext>
<DebitCard>VD</DebitCard>
<BankAccountNumber>12345678</BankAccountNumber>
<BankSortCode>9987655</BankSortCode>
<NIN></NIN>
<LoanAmount>500</LoanAmount>
<IPAddress>127.0.0.1</IPAddress>
<Consent>1</Consent>
<TimeAtAddressYears>2</TimeAtAddressYears>
<TimeAtAddressMonths>3</TimeAtAddressMonths>
<UserAgent>Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)</UserAgent>
<LoanPurpose>Car</LoanPurpose>
<Pricequote>10</Pricequote>
<HousingExpenditure>100</HousingExpenditure>
<CreditExpenditure>150</CreditExpenditure>
<OtherExpenditure>220</OtherExpenditure></LoanRequest>';
$url = 'http://www.pdbuk.co.uk/API/loan.asmx?wsdl';
$options["location"] = $url;
$options['trace'] = 1;
$client = new SoapClient($url, $options);
$result = $client->SendRequest($xmlData);
var_dump($result);
?>
What am I doing wrong? Thanks!
The XML data that should be passed should be an array rather than a string (based on the WSDL). So this will work:
$result = $client->SendRequest(array('inpXml' => $xmlData));

Categories