I have problems extracting attribute text from Image tag using the Facebook Instant Articles SDK Transformer
I cannot figure out the rules.json required to extract the text from alt attribute and make a caption out of it.
//MARKUP
<img src="https://upload.wikimedia.org/wikipedia/commons/8/84/Example.svg" alt="Foto By: Bla Bla"/>
//RULES.JSON
{
"class": "ImageRule",
"selector" : "img",
"properties" :
{
"image.url" :
{
"type" : "string",
"selector" : "img",
"attribute": "src"
},
"image.caption" :
{
"type" : "string",
"selector" : "img",
"attribute" : "alt"
}
}
}
Expected results are Facebook Instant Article compliant markup like:
<figure>
<img src="https://upload.wikimedia.org/wikipedia/commons/8/84/Example.svg"/>
<figcaption>Foto By: Bla Bla</figcaption>
</figure>
What I get is Uncaught Error: Call to a member function hasChildNodes() on string in /Facebook/InstantArticles/Transformer/Transformer.php on line 305.
Somehow image gets processed, the caption gets processed, I get the correct value but then it recursively again enters transform function passing in the extracted "alt" string and it fails because it expects an HTML Node input not a String.
Facebooks documentation on the matter is extremely vague so if anyone has some experience dealing with Facebook Instant Articles please chime in.
shitty docs can be found here:https://developers.facebook.com/docs/instant-articles/sdk/transformer/
https://developers.facebook.com/docs/instant-articles/sdk/transformer-rules
main committer of SDK here.
You can check the setup we have into the SimpleTransformerTest.php that covers exactly your need. You can also use any tests to play around with the Transformer.
What you are doing wrong is the selector for the image.caption that should be a type of element.
For your Rules.json it should look like:
{
"class": "CaptionRule",
"selector" : "//img[#alt]",
"properties" : {
"caption.default": {
"type": "string",
"selector": "img",
"attribute": "alt"
}
}
},
{
"class": "ImageRule",
"selector" : "figure",
"properties" :
{
"image.url" :
{
"type" : "string",
"selector" : "img",
"attribute": "src"
},
"image.caption" :
{
"type" : "element",
"selector" : "img"
}
}
}
Check that I'm using a different strategy instead of going straight to the <img> element on the ImageRule, I'm picking the <figure> tag, so then we can keep the transformer intact.
Note that the rules.json are applied bottom up.
Let me know if this covers your need.
Related
I have json-ld script which shows job posts google jobs section.
<script type="application/ld+json">
{
"#context" : "https://schema.org/",
"#type" : "JobPosting",
"title" : "<?php echo($title); ?>",
"description" :"<?php echo($description); ?>",
"hiringOrganization" : {
"#type" : "Organization",
"name" : "<?php echo($name); ?>",
"logo" : "example.com/images/<?php echo($id);?>.jpg"
},
"jobLocation": {
"#type": "Place",
"address": {
"#type": "PostalAddress",
"streetAddress": "MW",
"addressLocality": "Moscow",
"addressRegion": "Russia",
"addressCountry": "RU",
"postalCode": ""
}
},
"baseSalary": {
"#type": "MonetaryAmount",
"currency": "RUB",
"value": {
"#type": "QuantitativeValue",
"value": "1500",
"unitText": "HOUR"
}
},
"datePosted" : "2021-06-21",
"validThrough" : "2021-08-18T00:00",
"employmentType": "FULL_TIME"
}
</script>
for the 1st post it worked, I can find the vacancy on Google Jobs section. However when I add more job posts it is not showing, I tested them here says it is ok.
robots.txt content : Sitemap: https://example.com/sitemap.xml
sitemap.xml content :
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com</loc>
<lastmod>2021-06-21</lastmod>
</url>
</urlset>
And job posts url is https://example.com/posts/71.php 72 and so on.
Anyone who can help with this?
Creating structured data for a job listing is contrary to Google's guidelines:
Put structured data on the most detailed leaf page possible. Don't add
structured data to pages intended to present a list of jobs (for
example, search result pages).
About the test validity of your structured data. Automated testing tools can skip content compliance with some Google requirements. Read more of General structured data guidelines:
These guidelines are not easily testable using an automated tool.
In my case sitemap wasn't set correctly. Make sure your sitemap sees the posts
Discovered URLs 170
I'm using ElasticSearch's PHP client and I find really difficult to return results with scores whenever I want to search for a word that is "hidden" within a string.
This is an example:
I want to get all the documents where the field "file" has the word "anses" and files are named like this:
axx14anses19122015.zip
What I know about it
I know I should tokenize those words, can't realize how to do it.
Also I've read about aggregations but I'm really new to ES and I have to deliver a working piece ASAP.
What I've tried so far
REGEXP: using regular expressions is very expensive and does not return any scores, which is a must-to-have in order to shrink results and bring the user accurate information.
Wildcards: same thing, slow and no scores
Own script where I have a dictionary and search for critical words using regexp, if match, create a new field within that matched document with the word. The reason is to create a TOKEN so in future searches I can use regular match with scores. Negative side: the dictionary thing was totally denied by my boss so I'm here asking for any ideas.
Thanks in advance.
I suggest in your case nGram tokenizer see the example
I will create a analyzer and a mapping for a doc type
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"tokenizer": {
"ngram_tokenizer": {
"type": "nGram",
"min_gram": 4,
"max_gram": 4,
"token_chars": [ "letter", "digit" ]
}
},
"analyzer": {
"ngram_tokenizer_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"text_field": {
"type": "string",
"term_vector": "yes",
"analyzer": "ngram_tokenizer_analyzer"
}
}
}
}
}
after that I`ll insert a document using your file name
PUT /test_index/doc/1
{
"text_field": "axx14anses19122015"
}
now I`ll just will use a query match
POST /test_index/_search
{
"query": {
"match": {
"text_field": "anses"
}
}
}
and will receive a reponse like this
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.10848885,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.10848885,
"_source": {
"text_field": "axx14anses19122015"
}
}
]
}
}
What i did?
i just created a nGram tokenizer that will explode our string in 4 characters terms and will index this terms separated and they will be searched when I search a part of the string.
To see more, read this article https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch
Hope it help!
Ok after trying -so- many times it worked. I'll share the solution just in case someone else needs it. Thank you so much to Waldemar, it was a really good approach and I still cannot see why it's not working.
curl -XPUT 'http://ipaddresshere/tokentest' -d
'{ "settings":
{ "number_of_shards": 1, "analysis" :
{ "analyzer" : { "myngram" : { "tokenizer" : "mytokenizer" } },
"tokenizer" : { "mytokenizer" : {
"type" : "nGram",
"min_gram" : "3",
"max_gram" : "5",
"token_chars" : [ "letter", "digit" ] } } } },
"mappings":
{ "doc" :
{ "properties" :
{ "field" : {
"type" : "string",
"term_vector" : "yes",
"analyzer" : "myngram" } } } } }'
Sorry for bad indentation, I'm really hurry but want to post the solution.
So, this will take any string from "field" and split it into nGrams with lenght 3 to 5. For example: "abcanses14f.zip" will result in:
abc, abca, abcan, bca, bcan, bcans, etc... until it reaches anses or a similar term which is matcheable and has a score related to it.
I have a directory into which some json files are regularly updated. What I want to do is deserialize them in my Symfony2 application to get at their juicy data.
The examples on the symfony site include extremely simple flat JSON examples, that do not reflect the nested reality of real world nested JSON data. For example, the following is a simplified version of the files I want to deserialize.
{
"uid" : "some unique identifier"
"title" : "this is a tile",
"description" : "some description",
"paragraphs" : [
{
"position" : "left",
"body" : "a lot of text here",
"video":{
"ogg" : "path1",
"webm" : "path2",
"mp4" : "path3"
}
},
{
"position" : "right",
"body" : "a lot of text here",
"video":{
"ogg" : "path1",
"webm" : "path2",
"mp4" : "path3"
}
}
]
}
Of course, I want to deserialize this nested JSON into a simple, easy to access, model.
What I want to know is how to write the Content class for the above JSON so that when I call $filecontent = $serializer->deserialize($data, 'Acme\Content', 'json'); it deserializes successfully.
This should deserialize your JSON with ease:
$fileContent = json_decode($jsonData);
Documentation for json_decode can be found here
I have a JSON string and i want to get the value.
$s='{
"subscriptionId" : "51c04a21d714fb3b37d7d5a7",
"originator" : "localhost",
"contextResponses" : [
{
"contextElement" : {
"attributes" : [
{
"name" : "temperature",
"type" : "centigrade",
"value" : "26.5"
}
],
"type" : "Room",
"isPattern" : "false",
"id" : "Room1"
},
"statusCode" : {
"code" : "200",
"reasonPhrase" : "OK"
}
}
]
}';
Here is the code which I used but it didn't work.
$result = json_decode($s,TRUE); //decode json string
$b=$result ['contextResponses']['contextElement']['value']; //get the value????
echo $b;
ContextResponses contains a numerically indexed array (of only one item) and value property is more deeply nested than what you are trying to reference (it is within attributes array). This would appear to be what you need:
$b = $result['contextResponses'][0]['contextElement']['attributes'][0]['value'];
When reading a JSON-serialiazed data structure like that, you need to make sure and note every opening [ or { as they have significant meaning in regards to how you need to reference the items that follow it. You also may want to consider using something like var_dump($result) in your investigations, as this will show you the structure of the data after it has been deserialized, oftentimes making it easier to understand.
Also, proper indention when looking at something like this would help. Use something like http://jsonlint.com to copy/paste your JSON for easy reformatting. If you had your structure like the following, nesting levels become more readily apparent.
{
"subscriptionId": "51c04a21d714fb3b37d7d5a7",
"originator": "localhost",
"contextResponses": [
{
"contextElement": {
"attributes": [
{
"name": "temperature",
"type": "centigrade",
"value": "26.5"
}
],
"type": "Room",
"isPattern": "false",
"id": "Room1"
},
"statusCode": {
"code": "200",
"reasonPhrase": "OK"
}
}
]
}
I'm new to the map reduce concept and even though I'm making some slow progress, I'm finding some issues that I need some help with.
I have a simple collection consisting of an id, city and and destination, something like this:
{ "_id" : "5230e7e00000000000000000", "city" : "Boston", "to" : "Chicago" },
{ "_id" : "523fe7e00000000000000000", "city" : "New York", "to" : "Miami" },
{ "_id" : "5240e1e00000000000000000", "city" : "Boston", "to" : "Miami" },
{ "_id" : "536fe4e00000000000000000", "city" : "Washington D.C.", "to" : "Boston" },
{ "_id" : "53ffe7e00000000000000000", "city" : "New York", "to" : "Boston" },
{ "_id" : "5740e1e00000000000000000", "city" : "Boston", "to" : "Miami" },
...
(Please do note that this data is just made up for example purposes)
I'd like to group by city the destinations including a count:
{ "city" : "Boston", values : [{"Chicago",1}, {"Miami",2}] }
{ "city" : "New York", values : [{"Miami",1}, {"Boston",1}] }
{ "city" : "Washington D.C.", values : [{"Boston", 1}] }
For this I'm starting to playing with this function to map:
function() {
emit(this.city, this.to);
}
which performs the expected grouping. My reduce function is this:
function(key, values) {
var reduced = {"to":[]};
for (var i in values) {
var item = values[i];
reduced.to.push(item);
}
return reduced;
}
which gives somewhat an expected output:
{ "_id" : ObjectId("522f8a9181f01e671a853adb"), "value" : { "to" : [ "Boston", "Miami" ] } }
{ "_id" : ObjectId("522f933a81f01e671a853ade"), "value" : { "to" : [ "Chicago", "Miami", "Miami" ] } }
{ "_id" : ObjectId("5231f0ed81f01e671a853ae0"), "value" : "Boston" }
As you can see, I still haven't counted the repeated cities, but as can be seen above, for some reason the last result in the output doesn't look good. I'd expected it to be
{ "_id" : ObjectId("5231f0ed81f01e671a853ae0"), "value" : { "to" : ["Boston"] } }
Has this anything to do with the fact that there is a single item? Is there any way to obtain this?
Thank you.
I see you are asking about a PHP issue, but you are using javascript to ask, so I’m assuming a javascript answer will help you move things along. As such here is the javascript needed in the shell to run your aggregation. I strong suggest getting your aggregation working in the shell(or some other javascript editor) in general and then translating it into the language of your choice. It is a lot easier to see what is going on and there faster using this method. You can then run:
use admin
db.runCommand( { setParameter: 1, logLevel: 2 } )
to check the bson output of your selected language vs what the shell looks like. This will appear in the terminal if mongo is in the foreground, otherwise you’ll have ot look in the logs.
Summing the routes in the aggregation framework [AF] with Mongo is fairly strait forward. The AF is faster and easier to use then map reduce[MR]. Though in this case they both have similar issues, simply pushing to an array won’t yield a count in and of itself (in MR you either need more logic in your reduce function or to use a finalize function).
With the AF using the example data provided this pipeline is useful:
db.agg1.aggregate([
{$group:{
_id: { city: "$city", to: "$to" },
count: { $sum: 1 }
}},
{$group: {
_id: "$_id.city",
to:{ $push: {to: "$_id.to", count: "$count"}}
}}
]);
The aggregation framework can only operate on known fields, but many pipeline operations so a problem needs to broken down with that as a consideration.
Above, the 1st stage calculates the numbers need, for which there are 3 fixed fields: the source, the destination, and the count.
The second stage has 2 fixed fields, one of which is an array, which is only being pushed to (all the data for the final form is there).
For MR you can do this:
var map = function() {
var key = {source:this.city, dest:this.to};
emit(key, 1);
};
var reduce = function(key, values) {
return Array.sum(values);
};
A separate function will have to pretty it however.
If you have any additional questions please don’t hesitate to ask.
Best,
Charlie