I tried to extract a search result from this page: "http://std.stheadline.com/daily/formerly.php".
While selecting on webpage 20-Nov to 22-Nov and checking the "財經" news category check box, gives 47 results.
However, my python php codes with parameters obtained from Chrome Inspect, yield 162 results. It seems the sever did not recognize my code parameters and given me the results of ALL news categories of the latest date.
I used this codes:
import pandas as pd
url= "http://std.stheadline.com/daily/ajax/ajaxFormerly.php?startDate=2019-11-20&endDate=2019-11-22&type%5B%5D=15&keyword="
df = pd.read_json(url)
print(df.info(verbose=True))
print(df)
also tried:
url= "http://std.stheadline.com/daily/ajax/ajaxFormerly.php?startDate=2019-11-20&endDate=2019-11-22&type=15&keyword="
It uses POST request which sends parameters in body, not in url. You can't send parameters in url. You may use module requests (or urllib) to send POST requests
import requests
url = 'http://std.stheadline.com/daily/ajax/ajaxFormerly.php'
params = {
'startDate': '2019-11-20',
'endDate': '2019-11-22',
'type[]': '15',
'keyword': '',
}
r = requests.post(url, data=params)
data = r.json()
print(data['totalCount']) # 47
To load it to DataFrame you may have to use io.StringIO to create file in memory.
import requests
import pandas as pd
import io
url = 'http://std.stheadline.com/daily/ajax/ajaxFormerly.php'
params = {
'startDate': '2019-11-20',
'endDate': '2019-11-22',
'type[]': '15',
'keyword': '',
}
r = requests.post(url, data=params)
f = io.StringIO(r.text)
df = pd.read_json(f)
print(df)
Related
I'm building a scrapy spider that checks if there are stock of some products in an online web shop.
The idea is to call this spider from PHP/Delphi code, passing a list of products (3500 references). Then, the spider returns another list with stock information.
This is my spider:
import scrapy
from scrapy.crawler import CrawlerProcess
class Spider(scrapy.Spider):
name = "Spider"
start_urls = ['https://www.url.net/Administration/Account/Login']
def parse(self, response):
return scrapy.FormRequest.from_response(
response,
formdata={'UserName': 'username', 'Password': 'password'},
callback=self.after_login
)
def after_login(self, response):
yield scrapy.Request(url="https://www.url.net/Home/Home/ShowPriceDetail?articleNo=" + REFERENCE, callback=self.parse_stock)
def parse_stock(self, response):
print("STOCK" + response.selector.xpath('//*[#id="priceDetails"]/form/div[8]/div[1]/span/span[2]/text()').extract_first())
print("Date" + response.selector.xpath('//*[#id="priceDetails"]/form/div[8]/div[1]/span/span[1]/i/#style').extract_first())
So... What is the correct way to do this? I know that you can pass arguments to spider using something like:
def __init__(self, product=None, *args, **kwargs):
super(Spider, self).__init__(*args, **kwargs)
And I know that you can execute a spider from another python script with CrawlerProcess. Also, I know that you can call a python script from PHP using:
<?php
$command = escapeshellcmd('/home/myscript.py');
$output = shell_exec($command);
echo $output;
?>
But I don't know how to merge all of this methods...
Thanks in advance.
You have to use some data storage to transfer your data.
So in your other programming language you save the data you have in some file or database, e.g. csv or json and then in you pass the file name to your scrapy spider via command argument. Finally in your spider you can iterate through the file contents to generate requests.
For example if we have this json:
{ "items": [
{ "url": "http://example1.com" },
{ "url": "http://example2.com" }
]}
We would use something like:
class MySpider(scrapy.Spider):
name = 'myspider'
def __init__(self, *args, **kwargs):
super(self, *args, **kwargs)
self.filename = kwargs.get('filename', None)
def start_requests(self):
if not self.filename:
raise NotImplementedError('missing argument filename')
with open(self.filename, 'r') as f:
data = json.loads(f.read())
for item in data['items']:
yield Request(item['url'])
I'm trying to get data from php script using a my python script:
#!/usr/bin/python
import urllib
import urllib2
url = 'https://example.com/example.php'
data = urllib.urlencode({'login' : 'mylogin', 'pwd' : 'mypass', 'data' : 'mydata'})
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
d = response.read()
print d
doesn't run with error:
ERROR=1704
php script accepts:
url: https://example.com/example.php?login=xxxxxxx&pwd=xxxxxxx&t=3
Isn't it because it is https, as described in the most voted response here: python ignore certicate validation urllib2
I am new to python. I have created a gui based app to insert values into database.
I have created a Rest api to handle db operations. How can i append the api URL with json created in python.
app.py
from Tkinter import *
import tkMessageBox
import json
import requests
from urllib import urlopen
top = Tk()
L1 = Label(top, text="Title")
L1.pack( side = TOP)
E1 = Entry(top, bd =5)
E1.pack(side = TOP)
L2 = Label(top, text="Author")
L2.pack( side = TOP)
E2 = Entry(top, bd =5)
E2.pack(side = TOP)
L3 = Label(top, text="Body")
L3.pack( side = TOP)
E3 = Entry(top, bd =5)
E3.pack(side = TOP)
input = E2.get();
def callfunc():
data = {"author": E2.get(),
"body" : E3.get(),
"title" : E1.get()}
data_json = json.dumps(data)
# r = requests.get('http://localhost/spritle/api.php?action=get_uses')
#url = "http://localhost/spritle/api.php?action=insert_list&data_json="
#
url = urlopen("http://localhost/spritle/api.php?action=insert_list&data_json="%data_json).read()
tkMessageBox.showinfo("Result",data_json)
SubmitButton = Button(text="Submit", fg="White", bg="#0094FF",
font=("Grobold", 10), command = callfunc)
SubmitButton.pack()
top.mainloop()
Error:
TypeError: not all arguments converted during string formatting
i AM GETTING error while appending url with data_json ?
There is an error on string formating:
Swap this:
"http://localhost/spritle/api.php?action=insert_list&data_json="%data_json
by this:
"http://localhost/spritle/api.php?action=insert_list&data_json=" + data_json
or:
"http://localhost/spritle/api.php?action=insert_list&data_json={}".format(data_json)
The following statements are equivalents:
"Python with " + "PHP"
"Python with %s" % "PHP"
"Python with {}".format("PHP")
"Python with {lang}".format(lang="PHP")
Also, I don't think sending JSON data like this via URL is a good idea. You should encode the data at least.
You are trying to use % operator to format the string, and you need to put the %s placeholder into the string:
"http://localhost/spritle/api.php?action=insert_list&data_json=%s" % data_json
Or use other methods suggested in another answer.
Regarding the data transfer - you definitely need to use POST request and not GET.
Check this, using urllib2 and this, using requests.
I am calling a php api via curl
ncServerURL='http://myserver/acertify.php'
# binaryptr = open('sampleamex.xml','rb').read()
# print binaryptr
c = pycurl.Curl()
c.setopt(pycurl.URL, ncServerURL)
c.setopt(pycurl.POST, 1)
c.setopt(pycurl.SSL_VERIFYPEER, 0)
c.setopt(pycurl.SSL_VERIFYHOST, 0)
header=["Content-type: text/xml","SOAPAction:run",'Content-Type: text/xml; charset=utf-8','Content-Length: '+str(len(xmldata))]
# print header
c.setopt(pycurl.HTTPHEADER, header)
c.setopt(pycurl.POSTFIELDS, "xml="+str(xmldata))
import StringIO
b = StringIO.StringIO()
c.setopt(pycurl.WRITEFUNCTION, b.write)
c.perform()
ncServerData = b.getvalue()
return ncServerData
and posting xml data. in acertify.php and i am not able to xml data in php files , i am working on a project , what i don't know in this , how can i get curl posted data in this file .
<?php
echo "hi";
print_r($_SESSION);
print_r($_POST);
// print_r($_FILES);
?>
If you mean getting POST data in php, then at first glance looks like you are posting a single field c.setopt(pycurl.POSTFIELDS, "xml="+str(xmldata))
so it should just be $_POST['xml']
And if you mean reading data with curl as a response, then curl should have returntransfer option on execution (i'm not familiar with python syntax)
I've found a PHP script that lets me do what I asked in this SO question. I can use this just fine, but out of curiosity I'd like to recreate the following code in Python.
I can of course use urllib2 to get the page, but I'm at a loss on how to handle the cookies since mechanize (tested with Python 2.5 and 2.6 on Windows and Python 2.5 on Ubuntu...all with latest mechanize version) seems to break on the page. How do I do this in python?
require_once "HTTP/Request.php";
$req = &new HTTP_Request('https://steamcommunity.com');
$req->setMethod(HTTP_REQUEST_METHOD_POST);
$req->addPostData("action", "doLogin");
$req->addPostData("goto", "");
$req->addPostData("steamAccountName", ACC_NAME);
$req->addPostData("steamPassword", ACC_PASS);
echo "Login: ";
$res = $req->sendRequest();
if (PEAR::isError($res))
die($res->getMessage());
$cookies = $req->getResponseCookies();
if ( !$cookies )
die("fail\n");
echo "pass\n";
foreach($cookies as $cookie)
$req->addCookie($cookie['name'],$cookie['value']);
Similar to monkut's answer, but a little more concise.
import urllib, urllib2
def steam_login(username,password):
data = urllib.urlencode({
'action': 'doLogin',
'goto': '',
'steamAccountName': username,
'steamPassword': password,
})
request = urllib2.Request('https://steamcommunity.com/',data)
cookie_handler = urllib2.HTTPCookieProcessor()
opener = urllib2.build_opener(cookie_handler)
response = opener.open(request)
if not 200 <= response.code < 300:
raise Exception("HTTP error: %d %s" % (response.code,response.msg))
else:
return cookie_handler.cookiejar
It returns the cookie jar, which you can use in other requests. Just pass it to the HTTPCookieProcessor constructor.
monkut's answer installs a global HTTPCookieProcessor, which stores the cookies between requests. My solution does not modify the global state.
I'm not familiar with PHP, but this may get you started.
I'm installing the opener here which will apply it to the urlopen method. If you don't want to 'install' the opener(s) you can use the opener object directly. (opener.open(url, data)).
Refer to:
http://docs.python.org/library/urllib2.html?highlight=urllib2#urllib2.install_opener
import urlib2
import urllib
# 1 create handlers
cookieHandler = urllib2.HTTPCookieProcessor() # Needed for cookie handling
redirectionHandler = urllib2.HTTPRedirectHandler() # needed for redirection
# 2 apply the handler to an opener
opener = urllib2.build_opener(cookieHandler, redirectionHandler)
# 3. Install the openers
urllib2.install_opener(opener)
# prep post data
datalist_tuples = [ ('action', 'doLogin'),
('goto', ''),
('steamAccountName', ACC_NAME),
('steamPassword', ACC_PASS)
]
url = 'https://steamcommunity.com'
post_data = urllib.urlencode(datalist_tuples)
resp_f = urllib2.urlopen(url, post_data)