I'm using cURL in PHP to search through code on GitHub. I have everything working except one small option that i cannot figure out how to do.
I have passed the header: Accept: application/vnd.github.v3.text-match+json into cURL just fine, which gives me the extra [text_match] section like so:
{
"text_matches": [
{
"object_url": "https://api.github.com/repositories/167174/contents/src/attributes/classes.js?ref=825ac3773694e0cd23ee74895fd5aeb535b27da4",
"object_type": "FileContent",
"property": "content",
"fragment": ";\n\njQuery.fn.extend({\n\taddClass: function( value ) {\n\t\tvar classes, elem, cur, clazz, j, finalValue",
"matches": [
{
"text": "addClass",
"indices": [
23,
31
]
}
]
},
{
"object_url": "https://api.github.com/repositories/167174/contents/src/attributes/classes.js?ref=825ac3773694e0cd23ee74895fd5aeb535b27da4",
"object_type": "FileContent",
"property": "content",
"fragment": ".isFunction( value ) ) {\n\t\t\treturn this.each(function( j ) {\n\t\t\t\tjQuery( this ).addClass( value.call( this",
"matches": [
{
"text": "addClass",
"indices": [
80,
88
]
}
]
}
]
}
I use the $item['text_matches][$i]['fragment'] to refer to the line of code that i'm searching for.
What i was hoping to do was explicitly state what exact line in the file where the code fragment begins.
Originally, i thought $item['text_matches][$i]['matches]['indices'] was what i was looking for, but this is not the case.
So, how would i go about getting the exact line, because i dont see it within the json dataset.
Related
I'm using ElasticSearch's PHP client and I find really difficult to return results with scores whenever I want to search for a word that is "hidden" within a string.
This is an example:
I want to get all the documents where the field "file" has the word "anses" and files are named like this:
axx14anses19122015.zip
What I know about it
I know I should tokenize those words, can't realize how to do it.
Also I've read about aggregations but I'm really new to ES and I have to deliver a working piece ASAP.
What I've tried so far
REGEXP: using regular expressions is very expensive and does not return any scores, which is a must-to-have in order to shrink results and bring the user accurate information.
Wildcards: same thing, slow and no scores
Own script where I have a dictionary and search for critical words using regexp, if match, create a new field within that matched document with the word. The reason is to create a TOKEN so in future searches I can use regular match with scores. Negative side: the dictionary thing was totally denied by my boss so I'm here asking for any ideas.
Thanks in advance.
I suggest in your case nGram tokenizer see the example
I will create a analyzer and a mapping for a doc type
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"tokenizer": {
"ngram_tokenizer": {
"type": "nGram",
"min_gram": 4,
"max_gram": 4,
"token_chars": [ "letter", "digit" ]
}
},
"analyzer": {
"ngram_tokenizer_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"text_field": {
"type": "string",
"term_vector": "yes",
"analyzer": "ngram_tokenizer_analyzer"
}
}
}
}
}
after that I`ll insert a document using your file name
PUT /test_index/doc/1
{
"text_field": "axx14anses19122015"
}
now I`ll just will use a query match
POST /test_index/_search
{
"query": {
"match": {
"text_field": "anses"
}
}
}
and will receive a reponse like this
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.10848885,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.10848885,
"_source": {
"text_field": "axx14anses19122015"
}
}
]
}
}
What i did?
i just created a nGram tokenizer that will explode our string in 4 characters terms and will index this terms separated and they will be searched when I search a part of the string.
To see more, read this article https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch
Hope it help!
Ok after trying -so- many times it worked. I'll share the solution just in case someone else needs it. Thank you so much to Waldemar, it was a really good approach and I still cannot see why it's not working.
curl -XPUT 'http://ipaddresshere/tokentest' -d
'{ "settings":
{ "number_of_shards": 1, "analysis" :
{ "analyzer" : { "myngram" : { "tokenizer" : "mytokenizer" } },
"tokenizer" : { "mytokenizer" : {
"type" : "nGram",
"min_gram" : "3",
"max_gram" : "5",
"token_chars" : [ "letter", "digit" ] } } } },
"mappings":
{ "doc" :
{ "properties" :
{ "field" : {
"type" : "string",
"term_vector" : "yes",
"analyzer" : "myngram" } } } } }'
Sorry for bad indentation, I'm really hurry but want to post the solution.
So, this will take any string from "field" and split it into nGrams with lenght 3 to 5. For example: "abcanses14f.zip" will result in:
abc, abca, abcan, bca, bcan, bcans, etc... until it reaches anses or a similar term which is matcheable and has a score related to it.
I have a JSON string and i want to get the value.
$s='{
"subscriptionId" : "51c04a21d714fb3b37d7d5a7",
"originator" : "localhost",
"contextResponses" : [
{
"contextElement" : {
"attributes" : [
{
"name" : "temperature",
"type" : "centigrade",
"value" : "26.5"
}
],
"type" : "Room",
"isPattern" : "false",
"id" : "Room1"
},
"statusCode" : {
"code" : "200",
"reasonPhrase" : "OK"
}
}
]
}';
Here is the code which I used but it didn't work.
$result = json_decode($s,TRUE); //decode json string
$b=$result ['contextResponses']['contextElement']['value']; //get the value????
echo $b;
ContextResponses contains a numerically indexed array (of only one item) and value property is more deeply nested than what you are trying to reference (it is within attributes array). This would appear to be what you need:
$b = $result['contextResponses'][0]['contextElement']['attributes'][0]['value'];
When reading a JSON-serialiazed data structure like that, you need to make sure and note every opening [ or { as they have significant meaning in regards to how you need to reference the items that follow it. You also may want to consider using something like var_dump($result) in your investigations, as this will show you the structure of the data after it has been deserialized, oftentimes making it easier to understand.
Also, proper indention when looking at something like this would help. Use something like http://jsonlint.com to copy/paste your JSON for easy reformatting. If you had your structure like the following, nesting levels become more readily apparent.
{
"subscriptionId": "51c04a21d714fb3b37d7d5a7",
"originator": "localhost",
"contextResponses": [
{
"contextElement": {
"attributes": [
{
"name": "temperature",
"type": "centigrade",
"value": "26.5"
}
],
"type": "Room",
"isPattern": "false",
"id": "Room1"
},
"statusCode": {
"code": "200",
"reasonPhrase": "OK"
}
}
]
}
I'm a newbie in php but I'm trying to learn right now. What i want to make :
Send a curl request to github api like this :
curl_setopt($ch, CURLOPT_URL, 'https://api.github.com/legacy/repos/search/language:' . $lang);
And when i receive the result to display like a nice html page. The response that I'm receiving right now is displayed like the one written in github api documentation http://developer.github.com/v3/search/.
This is the first time I'm trying to learn PHP, but not the first time I've been web developing ( I've been contributing to a Hakyll based blog these weeks ).
My question is : How could i parse the results to format them nicely in a html page ?
The results are returned via JSON. You can make use of json_decode() for that.
Pass your cURL $response to this function. Such that print_r(json_decode($response,1));
Example of how to do it
<?php
$json='{
"text_matches": [
{
"object_url": "https://api.github.com/repositories/3081286",
"object_type": "Repository",
"property": "name",
"fragment": "Tetris",
"matches": [
{
"text": "Tetris",
"indices": [
0,
6
]
}
]
},
{
"object_url": "https://api.github.com/repositories/3081286",
"object_type": "Repository",
"property": "description",
"fragment": "A C implementation of Tetris using Pennsim through LC4",
"matches": [
{
"text": "Tetris",
"indices": [
22,
28
]
}
]
}
]
}';
$jarr=json_decode($json,1);
echo $jarr['text_matches'][0]['object_url']; //"prints" https://api.github.com/repositories/3081286
The aim is to save user entered JSON into a database. Now, before someone jumps at me, I know json, I know mysql and I know all the links inbetween.
The issue is: I need to safely store the ENTIRE JSON feed in a single cell in the table.
The background: this function will be a temp fix for a tool, that is needed asap but will require a lot of time. The temp fix will allow the system to go live with minimal code.
Users will create a GOOGLE maps style here ( http://gmaps-samples-v3.googlecode.com/svn/trunk/styledmaps/wizard/index.html)
and have the JSON made for them
[
{
"stylers": [
{ "visibility": "off" }
]
},{
"featureType": "water",
"stylers": [
{ "visibility": "on" }
]
},{
"featureType": "transit.line",
"elementType": "geometry.fill",
"stylers": [
{ "visibility": "on" },
{ "hue": "#ff3300" },
{ "color": "#ff0000" },
{ "weight": 0.7 }
]
},{
"featureType": "transit.station.rail",
"stylers": [
{ "visibility": "on" },
{ "color": "#0000ff" },
{ "weight": 4.6 }
]
},{
}
]
The site will then just call the JSON and apply it using jQuery later on. What would my ''best practice'' method be at doing this.
To answer your specific question, I would agree with the poster 'tom' that you should use a TEXT column.
However, I think for ease of use, you should also use prepared statements. If you create a prepared insert statement, you can then pass in the JSON directly. This will be the best representation in your database of the exact JSON (no annoying slashes) - AND be the safest. Please don't forget to do this step - its very important!
Since MySQL doesn't have a dedicated JSON column type, I would just store the JSON in an unbounded TEXT column. Just make sure you always check for valid JSON on write.
document structure example is:
{
"dob": "12-13-2001",
"name": "Kam",
"visits": {
"0": {
"service_date": "12-5-2011",
"payment": "40",
"chk_number": "1234455",
},
"1": {
"service_date": "12-15-2011",
"payment": "45",
"chk_number": "3461234",
},
"2": {
"service_date": "12-25-2011",
"payment": "25",
"chk_number": "9821234",
}
}
}
{
"dob": "10-01-1998",
"name": "Sam",
"visits": {
"0": {
"service_date": "12-5-2011",
"payment": "30",
"chk_number": "86786464",
},
"1": {
"service_date": "12-15-2011",
"payment": "35",
"chk_number": "45643461234",
},
"2": {
"service_date": "12-25-2011",
"payment": "20",
"chk_number": "4569821234",
}
}
}
In PHP i want to list all those "visits" information (and corresponding "name" ) for which payment is less than "30".
I want to print only the visits with "payment" < "30" not others. Is such query possible, or do i have to get entire document first using search and then use PHP to select such visits??
In the example document, the "payment" values are given as strings which may not work as intended with the $lt command. For this response, I have converted them to integers.
Wildcard queries are not possible with MongoDB, so with the given document structure, the key (0,1,2, etcetera) of the sub-document must be known. For instance, the following query will work:
> db.test.find({"visits.2.payment":{$lt:35}})
However,
> db.test.find({"visits.payment":{$lt:35}})
Will not work in this case, and
> db.test.find({"visits.*.payment":{$lt:35}})
will also not return any results.
In order to be able to query the embedded "visits" documents, you must change your document structure and make "visits" into an array or embedded documents, like so:
> db.test2.find().pretty()
{
"_id" : ObjectId("4f16199d3563af4cb141c547"),
"dob" : "10-01-1998",
"name" : "Sam",
"visits" : [
{
"service_date" : "12-5-2011",
"payment" : 30,
"chk_number" : "86786464"
},
{
"service_date" : "12-15-2011",
"payment" : 35,
"chk_number" : "45643461234"
},
{
"service_date" : "12-25-2011",
"payment" : 20,
"chk_number" : "4569821234"
}
]
}
Now you can query all of the embedded documents in "visits":
> db.test2.find({"visits.payment":{$lt:35}})
For more information, please refer to the Mongo documentation on dot notation:
http://www.mongodb.org/display/DOCS/Dot+Notation+%28Reaching+into+Objects%29
Now on to the second part of your question: it is not possible to return only a conditional sub-set of embedded documents.
With either document format, it is not possible to return a document containing ONLY the sub-documents that match the query. If one of the sub-documents matches the query , then the entire document matches the query, and it will be returned.
As per the Mongo Document "Retrieving a subset of fields"
http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields
We can return parts of embedded documents like so:
> db.test2.find({"visits.payment":{$lt:35}},{"visits.service_date":1}).pretty()
{
"_id" : ObjectId("4f16199d3563af4cb141c547"),
"visits" : [
{
"service_date" : "12-5-2011"
},
{
"service_date" : "12-15-2011"
},
{
"service_date" : "12-25-2011"
}
]
}
But we cannot have conditional retrieval of some sub documents. The closest that we can get is the $slice operator, but this is not conditional, and you will have to first know the location of each sub-document in the array:
http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields#RetrievingaSubsetofFields-RetrievingaSubrangeofArrayElements
In order for the application to display only the embedded documents that match the query, it will have to be done programmatically.
You may try:
$results = $mongodb->find(array("visits.payment" => array('$lt' => 30)));
But i don't know if it will work since visits is an object. BTW judging from what you posted it could be transfered to array (or should since numerical property names tends to cause confusion)
try - db.test2.find({"visits.payment":"35"})