Call scrapy from PHP/Delphi with references list - php

I'm building a scrapy spider that checks if there are stock of some products in an online web shop.
The idea is to call this spider from PHP/Delphi code, passing a list of products (3500 references). Then, the spider returns another list with stock information.
This is my spider:
import scrapy
from scrapy.crawler import CrawlerProcess
class Spider(scrapy.Spider):
name = "Spider"
start_urls = ['https://www.url.net/Administration/Account/Login']
def parse(self, response):
return scrapy.FormRequest.from_response(
response,
formdata={'UserName': 'username', 'Password': 'password'},
callback=self.after_login
)
def after_login(self, response):
yield scrapy.Request(url="https://www.url.net/Home/Home/ShowPriceDetail?articleNo=" + REFERENCE, callback=self.parse_stock)
def parse_stock(self, response):
print("STOCK" + response.selector.xpath('//*[#id="priceDetails"]/form/div[8]/div[1]/span/span[2]/text()').extract_first())
print("Date" + response.selector.xpath('//*[#id="priceDetails"]/form/div[8]/div[1]/span/span[1]/i/#style').extract_first())
So... What is the correct way to do this? I know that you can pass arguments to spider using something like:
def __init__(self, product=None, *args, **kwargs):
super(Spider, self).__init__(*args, **kwargs)
And I know that you can execute a spider from another python script with CrawlerProcess. Also, I know that you can call a python script from PHP using:
<?php
$command = escapeshellcmd('/home/myscript.py');
$output = shell_exec($command);
echo $output;
?>
But I don't know how to merge all of this methods...
Thanks in advance.

You have to use some data storage to transfer your data.
So in your other programming language you save the data you have in some file or database, e.g. csv or json and then in you pass the file name to your scrapy spider via command argument. Finally in your spider you can iterate through the file contents to generate requests.
For example if we have this json:
{ "items": [
{ "url": "http://example1.com" },
{ "url": "http://example2.com" }
]}
We would use something like:
class MySpider(scrapy.Spider):
name = 'myspider'
def __init__(self, *args, **kwargs):
super(self, *args, **kwargs)
self.filename = kwargs.get('filename', None)
def start_requests(self):
if not self.filename:
raise NotImplementedError('missing argument filename')
with open(self.filename, 'r') as f:
data = json.loads(f.read())
for item in data['items']:
yield Request(item['url'])

Related

Pythonon ajax php prase result is different from on screen result?

I tried to extract a search result from this page: "http://std.stheadline.com/daily/formerly.php".
While selecting on webpage 20-Nov to 22-Nov and checking the "財經" news category check box, gives 47 results.
However, my python php codes with parameters obtained from Chrome Inspect, yield 162 results. It seems the sever did not recognize my code parameters and given me the results of ALL news categories of the latest date.
I used this codes:
import pandas as pd
url= "http://std.stheadline.com/daily/ajax/ajaxFormerly.php?startDate=2019-11-20&endDate=2019-11-22&type%5B%5D=15&keyword="
df = pd.read_json(url)
print(df.info(verbose=True))
print(df)
also tried:
url= "http://std.stheadline.com/daily/ajax/ajaxFormerly.php?startDate=2019-11-20&endDate=2019-11-22&type=15&keyword="
It uses POST request which sends parameters in body, not in url. You can't send parameters in url. You may use module requests (or urllib) to send POST requests
import requests
url = 'http://std.stheadline.com/daily/ajax/ajaxFormerly.php'
params = {
'startDate': '2019-11-20',
'endDate': '2019-11-22',
'type[]': '15',
'keyword': '',
}
r = requests.post(url, data=params)
data = r.json()
print(data['totalCount']) # 47
To load it to DataFrame you may have to use io.StringIO to create file in memory.
import requests
import pandas as pd
import io
url = 'http://std.stheadline.com/daily/ajax/ajaxFormerly.php'
params = {
'startDate': '2019-11-20',
'endDate': '2019-11-22',
'type[]': '15',
'keyword': '',
}
r = requests.post(url, data=params)
f = io.StringIO(r.text)
df = pd.read_json(f)
print(df)

How do I run a php code string from Python?

I've found that you can run a php file from Python by using this:
import subprocess
proc = subprocess.Popen('php.exe input.php', shell=True, stdout=subprocess.PIPE)
response = proc.stdout.read().decode("utf-8")
print(response)
But is there a way to run php code from a string, not from a file? For example:
<?php
$a = ['a', 'b', 'c'][0];
echo($a);
?>
[EDIT]
Use php -r "code" with subprocess.Popen:
def php(code):
p = subprocess.Popen(["php", "-r", code],
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out = p.communicate() #returns a tuple (stdoutdata, stderrdata)
if out[1] != b'': raise Exception(out[1].decode('UTF-8'))
return out[0].decode('UTF-8')
code = """ \
$a = ['a', 'b', 'c'][2]; \
echo($a);"""
print(php(code))
[Original Answer]
I found a simple class that allows you to do that.
The code is self-explanatory. The class contains 3 methods:
get_raw(self, code): Given a code block, invoke the code and return the raw result as a string
get(self, code): Given a code block that emits json, invoke the code and interpret the result as a Python value.
get_one(self, code): Given a code block that emits multiple json values (one per line), yield the next value.
The example you wrote would look like this:
php = PHP()
code = """ \
$a = ['a', 'b', 'c'][0]; \
echo($a);"""
print (php.get_raw(code))
You can also add a prefix and postfix to the code with PHP(prefix="",postfix"")
PS.: I modified the original class because popen2 is deprecated. I also made the code compatible with Python 3. You can get it here :
import json
import subprocess
class PHP:
"""This class provides a stupid simple interface to PHP code."""
def __init__(self, prefix="", postfix=""):
"""prefix = optional prefix for all code (usually require statements)
postfix = optional postfix for all code
Semicolons are not added automatically, so you'll need to make sure to put them in!"""
self.prefix = prefix
self.postfix = postfix
def __submit(self, code):
code = self.prefix + code + self.postfix
p = subprocess.Popen(["php","-r",code], shell=True,
stdin=subprocess.PIPE, stdout=subprocess.PIPE)
(child_stdin, child_stdout) = (p.stdin, p.stdout)
return child_stdout
def get_raw(self, code):
"""Given a code block, invoke the code and return the raw result as a string."""
out = self.__submit(code)
return out.read()
def get(self, code):
"""Given a code block that emits json, invoke the code and interpret the result as a Python value."""
out = self.__submit(code)
return json.loads(out.read())
def get_one(self, code):
"""Given a code block that emits multiple json values (one per line), yield the next value."""
out = self.__submit(code)
for line in out:
line = line.strip()
if line:
yield json.loads(line)
Based on Victor Val's answer, here is my own compact version.
import subprocess
def run(code):
p = subprocess.Popen(['php','-r',code], stdout=subprocess.PIPE)
return p.stdout.read().decode('utf-8')
code = """ \
$a = ['a', 'b', 'c'][0]; \
echo($a);"""
print(run(code))

Shell_exec returns NULL in PHP

I am working on this project that requires me to upload pictures on PHP, execute the picture on python, fetch the output from python and display it again on PHP.
PHP code:
<?php
$command = shell_exec("python C:/path/to/python/KNNColor.py");
$jadi = json_decode($command);
var_dump($jadi);
?>
Python code:
from PIL import Image
import os
import glob
import cv2
import numpy as np
import matplotlib.pyplot as plt
from skimage import io, color
from scipy.stats import skew
#data train untuk warna
Feat_Mom_M = np.load('FeatM_M.npy')
Feat_Mom_I = np.load('FeatM_I.npy')
Malay_Col_Train = Feat_Mom_M
Indo_Col_Train = Feat_Mom_I
#Data warna
All_Train_Col = np.concatenate((Malay_Col_Train, Indo_Col_Train))
Y_Indo_Col = [0] * len(Indo_Col_Train)
Y_Malay_Col = [1] * len(Malay_Col_Train)
Y_Col_Train = np.concatenate((Y_Malay_Col, Y_Indo_Col))
Train_Col = list(zip(All_Train_Col, Y_Col_Train))
from collections import Counter
from math import sqrt
import warnings
#Fungsi KNN
def k_nearest_neighbors(data, predict, k):
if len(data) >= k:
warnings.warn('K is set to a value less than total voting groups!')
distances = []
for group in data:
for features in data[group]:
euclidean_dist = np.sqrt(np.sum((np.array(features) - np.array(predict))**2 ))
distances.append([euclidean_dist, group])
votes = [i[1] for i in sorted(distances)[:k]]
vote_result = Counter(votes).most_common(1)[0][0]
return vote_result
image_list = []
image_list_pixel = []
image_list_lab = []
L = []
A = []
B = []
for filename in glob.glob('C:/path/to/pic/uploaded/batik.jpg'):
im=Image.open(filename)
image_list.append(im)
im_pix = np.array(im)
image_list_pixel.append(im_pix)
#ubah RGB ke LAB
im_lab = color.rgb2lab(im_pix)
#Pisah channel L,A,B
l_channel, a_channel, b_channel = cv2.split(im_lab)
L.append(l_channel)
A.append(a_channel)
B.append(b_channel)
image_list_lab.append(im_lab)
<The rest is processing these arrays into color moment vector, it's too long, so I'm skipping it to the ending>
Feat_Mom = np.array(Color_Moment)
Train_Set_Col = {0:[], 1:[]}
for i in Train_Col:
Train_Set_Col[i[-1]].append(i[:-1])
new_feat_col = Feat_Mom
hasilcol = k_nearest_neighbors(Train_Set_Col, new_feat_col, 9)
import json
if hasilcol == 0:
#print("Indonesia")
print (json.dumps('Indonesia'));
else:
#print("Malaysia")
print (json.dumps('Malaysia'));
So as you can see, There is only one print command. Shell_exec is supposed to return the string of the print command from python. But what I get on the "var_dump" is NULL, and if I echo $jadi, there's also nothing. Be it using print or the print(json) command
The fun thing is, when I try to display a string from this python file that only consists 1 line of code.
Python dummy file:
print("Hello")
The "Hello" string, shows up just fine on my PHP. So, is shell_exec unable to read many codes? or is there anything else that I'm doing wrong?
I finally found the reason behind this. In my python script there are these commands :
Feat_Mom_M = np.load('FeatM_M.npy')
Feat_Mom_I = np.load('FeatM_I.npy')
They load the numpy arrays that I have stored from the training process in KNN and I need to use them again as the references for my image classifying process in python. I separated them because I was afraid if my PHP page would take too long to load. It'd need to process all the training data, before finally classifying the uploaded image.
But then when I execute my python file from PHP, I guess it returns an error after parsing those 2 load commands. I experimented putting the print command below them, and it stopped showing on PHP. Since it's all like this now, there's no other way than taking the worst option, even if it'd cost me long loading time.
I tested this in the console:
php > var_dump(json_decode("Indonesia"))
php > ;
php shell code:1:
NULL
php > var_dump(json_decode('{"Indonesia"}'))
php > ;
php shell code:1:
NULL
php > var_dump(json_decode('{"Indonesia":1}'))
php > ;
php shell code:1:
class stdClass#1 (1) {
public $Indonesia =>
int(1)
}
php > var_dump(json_decode('["Indonesia"]'))
php shell code:1:
array(1) {
[0] =>
string(9) "Indonesia"
}
you have to have it wrapped in {} or [] and it will be read into an object or an array.
After an error you can run this json_last_error() http://php.net/manual/en/function.json-last-error.php and it will give you an error code the one your's returns should be JSON_ERROR_SYNTAX

Pass JSON Data from PHP to Python Script

I'd like to be able to pass a PHP array to a Python script, which will utilize the data to perform some tasks. I wanted to try to execute my the Python script from PHP using shell_exec() and pass the JSON data to it (which I'm completely new to).
$foods = array("pizza", "french fries");
$result = shell_exec('python ./input.py ' . escapeshellarg(json_encode($foods)));
echo $result;
The "escapeshellarg(json_encode($foods)))" function seems to hand off my array as the following to the Python script (I get this value if I 'echo' the function:
'["pizza","french fries"]'
Then inside the Python script:
import sys, json
data = json.loads(sys.argv[1])
foods = json.dumps(data)
print(foods)
This prints out the following to the browser:
["pizza", "french fries"]
This is a plain old string, not a list. My question is, how can I best treat this data like a list, or some kind of data structure which I can iterate through with the "," as a delimiter? I don't really want to output the text to the browser, I just want to be able to break down the list into pieces and insert them into a text file on the filesystem.
Had the same problem
Let me show you what I did
PHP :
base64_encode(json_encode($bodyData))
then
json_decode(shell_exec('python ' . base64_encode(json_encode($bodyData)) );
and in Python I have
import base64
and
content = json.loads(base64.b64decode(sys.argv[1]))
as Em L already mentioned :)
It works for me
Cheers!
You can base64 foods to string, then passed to the data to Python and decode it.For example:
import sys, base64
if len(sys.argv) > 1:
data = base64.b64decode(sys.argv[1])
foods = data.split(',')
print(foods)
If you have the json string: data = '["pizza","french fries"]' and json.loads(data) isn't working (which it should), then you can use: MyPythonList = eval(data). eval takes a string and converts it to a python object
Was having problems passing json from PHP to Python, my problem was with escaping the json string, which you are doing. But looks like you were decoding then re-encoding with "food"
From what I understand
Python json.dumps(data) == PHP json_encode(data)
Python json.loads(data) == PHP json_decode(data)
json.loads(data) -> String Data
json.load(data) -> File Data
json.dumps(data) -> String Data
json.dump(data) -> File Data
PHP:
$foods = array("pizza", "french fries");
$result = shell_exec('python ./input.py ' . escapeshellarg(json_encode($foods)));
echo $result;
Python:
data = json.loads(sys.argv[1])
for v in data:
print(v)
ALSO
if you are passing key:value
PHP:
$foods = array("food1":"pizza", "food2":""french fries");
$result = shell_exec('python ./input.py ' . escapeshellarg(json_encode($foods)));
echo $result;
Python:
data = json.loads(sys.argv[1])
for(k,v) in content2.items():
print("k+" => "+v)
Python:
data = json.loads(sys.argv[1])
print(data['food1'])

Send json data to php api through python

I am new to python. I have created a gui based app to insert values into database.
I have created a Rest api to handle db operations. How can i append the api URL with json created in python.
app.py
from Tkinter import *
import tkMessageBox
import json
import requests
from urllib import urlopen
top = Tk()
L1 = Label(top, text="Title")
L1.pack( side = TOP)
E1 = Entry(top, bd =5)
E1.pack(side = TOP)
L2 = Label(top, text="Author")
L2.pack( side = TOP)
E2 = Entry(top, bd =5)
E2.pack(side = TOP)
L3 = Label(top, text="Body")
L3.pack( side = TOP)
E3 = Entry(top, bd =5)
E3.pack(side = TOP)
input = E2.get();
def callfunc():
data = {"author": E2.get(),
"body" : E3.get(),
"title" : E1.get()}
data_json = json.dumps(data)
# r = requests.get('http://localhost/spritle/api.php?action=get_uses')
#url = "http://localhost/spritle/api.php?action=insert_list&data_json="
#
url = urlopen("http://localhost/spritle/api.php?action=insert_list&data_json="%data_json).read()
tkMessageBox.showinfo("Result",data_json)
SubmitButton = Button(text="Submit", fg="White", bg="#0094FF",
font=("Grobold", 10), command = callfunc)
SubmitButton.pack()
top.mainloop()
Error:
TypeError: not all arguments converted during string formatting
i AM GETTING error while appending url with data_json ?
There is an error on string formating:
Swap this:
"http://localhost/spritle/api.php?action=insert_list&data_json="%data_json
by this:
"http://localhost/spritle/api.php?action=insert_list&data_json=" + data_json
or:
"http://localhost/spritle/api.php?action=insert_list&data_json={}".format(data_json)
The following statements are equivalents:
"Python with " + "PHP"
"Python with %s" % "PHP"
"Python with {}".format("PHP")
"Python with {lang}".format(lang="PHP")
Also, I don't think sending JSON data like this via URL is a good idea. You should encode the data at least.
You are trying to use % operator to format the string, and you need to put the %s placeholder into the string:
"http://localhost/spritle/api.php?action=insert_list&data_json=%s" % data_json
Or use other methods suggested in another answer.
Regarding the data transfer - you definitely need to use POST request and not GET.
Check this, using urllib2 and this, using requests.

Categories