Establish language called JSON - php

This is my problem. I explain:
I use Firefox. If I set the browser language to English, the following page displays text in Spanish, but the currency in dollars.
Link
The same URL, If I set the browser language to Spanish, the texts are displayed in Spanish and currency in Euros.
I created a script with PHP using JSON: How I can set the language for calls?
The following code, ALWAYS returns the English language:
<?php
$url = "http://steamcommunity.com/market/search/render/?l=spanish&start=0&count=20&currency=3&category_730_Weapon%5B%5D=tag_weapon_awp&appid=730&query=Man-o%27-war";
$json_object= file_get_contents($url);
$json_decoded = json_decode($json_object);
//precios
preg_match_all('/<span style="color:white">(.*)<\/span>/',$json_decoded->results_html, $sor);
foreach($sor[1] as $k => $v)
{
echo $v."<br/>";
}
?>
I want the currency Euros. I tried adding the following modifications, but the currency result is always English:
<html lang="es">
<head>
<meta http-equiv="Content-Language" content="es"/>
</head>
<body>
<?php
$locale = Locale::acceptFromHttp($_SERVER['HTTP_ACCEPT_LANGUAGE']);
echo $locale."<br/>";
$options = array(
'http'=>array(
'method'=>"GET",
'header'=>"Accept-language: es\r\n" .
"Cookie: foo=bar\r\n")
);
$context = stream_context_create($options);
$url = "http://steamcommunity.com/market/search/render/?l=spanish&start=0&count=20&currency=3&category_730_Weapon%5B%5D=tag_weapon_awp&appid=730&query=Man-o%27-war";
$json_object= file_get_contents($url,false,$context);
$json_decoded = json_decode($json_object);
//precios
preg_match_all('/<span style="color:white">(.*)<\/span>/',$json_decoded->results_html, $sor);
foreach($sor[1] as $k => $v)
{
echo $v."<br/>";
}
?>
</body>
</html>
Thank you for your help. Greetings.

You have a typo. Specifically, in your url. You are saying ?l=espanish. It should be ?l=spanish:
http://steamcommunity.com/market/search/render/?l=spanish&start=0&count=20&currency=3&category_730_Weapon%5B%5D=tag_weapon_awp&appid=730&query=Man-o%27-war
Edit
I don't have any more answers unfortunately, but I came across the following SO answer which might be helpful. It would seem that the currency shown is contextual - I guess you need to be logged in via your script?
https://stackoverflow.com/a/22623700/312962
Anyway, I hope this helps!

for the currency you have &currency=
3: USD
2: € (i believe, try it)

Related

Using Goutte to extract a namespaced attribute value

I'm trying to check if I can read <html> properties of a webpage to get the owner-declared language.
99% of the sites I checked, I found that info written as <html lang="XX"> or <html lang="XX-YY"> but in 1 particular site I found it written as <html xml:lang="XX">, and this last case is giving me headache.
I tried
$scraper_client = new \Goutte\Client();
$scraper_crawler = $scraper_client->request('GET', $link);
$response = $scraper_client->getResponse();
var_dump( $scraper_crawler->filter('html')->extract('xml:lang')) );
var_dump( $scraper_crawler->filter('html')->extract('xml|lang')) );
var_dump( $scraper_crawler->filter('html')->extract('xml::lang')) );
var_dump( $scraper_crawler->filter('html')->extract('#[xml:lang]')) );
But none of them seems working. Did anyone already do something similar?
Thank you in advance.
S.
EDIT
Just to complete the question, here is a link that contains the xml:lang attribute that is causing me problems:
http://www.ilgiornale.it/news/politica/silvio-berlusconi-centrodestra-oggi-pi-forte-passato-1482545.html
I don't know why but it's like Goutte cuts off this attributes.
I've only able to get the value with a regular expression:
$scraper_client = new \Goutte\Client();
$scraper_crawler = $scraper_client->request('GET', $link);
$response = $scraper_client->getResponse();
if (preg_match('/xml:lang=["\']{1}(.*?)["\']{1}/', $response, $matches)) {
var_dump($matches[1]);
} else {
echo 'not found';
}

php json request: json_decode unicode string [duplicate]

This question already has answers here:
PHP json_decode() returns NULL with seemingly valid JSON?
(29 answers)
Closed 1 year ago.
I try to get the contents of this json URL:
http://www.der-postillion.de/ticker/newsticker2.php
Problem seems to be that the contents of "text" have Unicode within.
Everytime I try to get the json_decode, it fails with NULL...never had that issue before. always pulling json that way:
$news_url_postillion = 'http://www.der-postillion.de/ticker/newsticker2.php';
$file = file_get_contents($news_url_postillion, false, $context);
$data = json_decode($file, TRUE);
//debug
print_r(array($data));
$news_text = $data['tickers'];
//test
echo $news_text->text[0]; //echo first text element for test
foreach($news_text as $news){
$news_text_output = $news->{'text'};
echo 'Text:' . echo $news_text_output; . '<br>';
}
Anybody any idea what is wrong here? tries to get encoding working for hours with things like:
header("Content-Type: text/json; charset=utf-8");
or
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Content: type=application/json\r\n" .
"Content-Type: text/html; charset=utf-8"
)
);
$context = stream_context_create($opts);
but no luck :(
Thanks for your help!
Solution:
the json source has some unwanted elements in it, like the BOM character at json start. I could not influence the source json, so the solution walkingRed provided put me on the right track. Only the utf8_decode was needed due to his code is only for english language without special characters.
My working code solution for parsing and output the json is:
<?php
// Postillion Newsticker Parser
$news_url_postillion = 'http://www.der-postillion.de/ticker/newsticker2.php';
$json_newsDataPostillion = file_get_contents($news_url_postillion);
// Fix the strange json source BOM stuff
$obj_newsDataPostillion = json_decode(preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $json_newsDataPostillion), true);
//DEBUG
//print_r($result);
foreach($obj_newsDataPostillion['tickers'] as $newsDataPostillion){
$newsDataPostillion_text = utf8_decode($newsDataPostillion['text']);
echo 'Text:' . $newsDataPostillion_text . '<br>';
};
?>
I made some search and get this:
$result = json_decode(preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $file), true);
Original post
BOM! There is a BOM character at the beginning of the document which you linked and you need to remove it before you try to decode its content.
You can see it e.g. if you would download that json with wget and display it with less.

PHP - urldecode. Different behavior with Wikipedia

all.
I have different behavior of function urldecode() in PHP 5.2.x. Especially you will be able to see it with Wikipedia as good example.
Firstly, my page where I have results of that function has meta:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
Than I'm using function:
$url = urldecode($url);
echo $url;
Here is example of $url variable:
http://ru.wikipedia.org/wiki/%D0%91%D1%80%D0%B5%D1%81%D1%82
It will be decoded good. Result: "Брест"
http://ru.wikipedia.org/wiki/%CC%EE%EB%EE%E4%E5%F7%ED%EE
It will not be converted good. Result: ���������, but should be "Молодечно".
What's wrong? Why? I'm tried to use all functions from function.urldecode.php at PHP web-site, but it didn't give me any successful results
Here is quick example of code to test in PHP:
<?php
$url = array();
$url[] = "http://ru.wikipedia.org/wiki/%D0%91%D1%80%D0%B5%D1%81%D1%82";
$url[] = "http://ru.wikipedia.org/wiki/%CC%EE%EB%EE%E4%E5%F7%ED%EE";
foreach ($url as $value) :
echo urldecode($value) . "<br/>";
endforeach;
?>
Thanks in advance!
Not sure where you've taken that url, but the correct utf-8 one for "Молодечно" is:
$url = 'http://ru.wikipedia.org/wiki/%D0%9C%D0%BE%D0%BB%D0%BE%D0%B4%D0%B5%D1%87%D0%BD%D0%BE';
echo urldecode($url);
Your one is cp1251 encoded
As said zerkms, the following url is cp1251 encoded. To convert it to UTF-8, just use this:
$url = 'http://ru.wikipedia.org/wiki/%CC%EE%EB%EE%E4%E5%F7%ED%EE';
echo iconv("Windows-1251","UTF-8",urldecode($url));
//output: Молодечно

Simple PHP Screen Scraping Function

I'm experimenting with autoblogging (i.e., RSS-driven blog posting) using WordPress, and all that's missing is a component to automattically fill in the content of the post with the content that the RSS's URL links to (RSS is irrelevant to the solution).
Using standard PHP 5, how could I create a function called fetchHTML([URL]) that returns the HTML content of a webpage that's found between the <body>...</body> tags?
Please let me know if there are any prerequisite "includes".
Thanks.
Okay, here's a DOM parser code example as requested.
<?php
function fetchHTML( $url )
{
$content = file_get_contents($url);
$html=new DomDocument();
$body=$html->getelementsbytagname('body');
foreach($body as $b){ $content=$b->textContent; break; }//hmm, is there a better way to do that?
return $content;
}
Assuming that it will always be <body> and not <BODY> or <body style="width:100%"> or anything except <body> and </body>, and with the caveat that you shouldn't use regex to parse HTML, even though I'm about to, here ya go:
<?php
function fetchHTML( $url )
{
$feed = '<body>Lots of stuff in here</body>';
$content = file_get_contents( $url );
preg_match( '/<body>([\s\S]{1,})<\/body>/m', $content, $match );
$content = $match[1];
return $content;
} // fetchHTML
?>
If you echo fetchHTML([some url]);, you'll get the html between the body tags.
Please note original caveats.
I think you're better of using a class like SimpleDom -> http://sourceforge.net/projects/simplehtmldom/ to extract the data as you don't need to write such complicated regular expressions

Detect remote charset in php

I would like to determine a remote page's encoding through detection of the Content-Type header tag
<meta http-equiv="Content-Type" content="text/html; charset=XXXXX" />
if present.
I retrieve the remote page and try to do a regex to find the required setting if present.
I am still learning hence the problem below...
Here is what I have:
$EncStart = 'charset=';
$EncEnd = '" \/\>';
preg_match( "/$EncStart(.*)$EncEnd/s", $RemoteContent, $RemoteEncoding );
echo = $RemoteEncoding[ 1 ];
The above does indeed echo the name of the encoding but it does not know where to stop so it prints out the rest of the line then most of the rest of the remote page in my test.
Example: When testing a remote russian page it printed:
windows-1251" />
rest of page ....
Which means that $EncStart was okay, but the $EncEnd part of the regex failed to stop the matching. This meta header usually ends in 3 different possibility after the name of the encoding.
"> | "/> | " />
I do not know weather this is usable to satisfy the end of the maching and if yes how to escape it. I played with different ways of doing it but none worked.
Thank you in advance for lending a hand.
add a question mark to your pattern to make it non-greedy (and there's also no need of 's')
preg_match( "/charset=\"(.+?)\"/", $RemoteContent, $RemoteEncoding );
echo $RemoteEncoding[ 1 ];
note that this won't handle charset = "..." or charset='...' and many other combinations.
Take a look at Simple HTML Dom Parser. With it, you can easily find the charset from the head without resorting to cumbersome regexes. But as David already commented, you should also examine the headers for the same information and prioritize it if found.
Tested example:
require_once 'simple_html_dom.php';
$source = file_get_contents('http://www.google.com');
$dom = str_get_html($source);
$meta = $dom->find('meta[http-equiv=content-type]', 0);
$src_charset = substr($meta ->content, stripos($meta ->content, 'charset=') + 8);
foreach ($http_response_header as $header) {
#list($name, $value) = explode(':', $header, 2);
if (strtolower($name) == 'content-type') {
$hdr_charset = substr($value, stripos($value, 'charset=') + 8);
break;
}
}
var_dump(
$hdr_charset,
$src_charset
);

Categories