Implementing a Knowledge Graph — Python
In part one of this two-part series (link to Part I), we saw how we can imitate a thought process by using a Knowledge Graph. In this part, let's get our hands dirty! 😄
We will use an Open Source Graph database called Cayley for the KG backend. Grab the latest binary from here according to your OS. After you have downloaded, go to the root directory and find the cayley.yml file. If its not present, create it. There should be a cayley_example.yml file to guide you. This file is the configuration file of cayley. One of the important use of this file would be to set up the backend database where the graph will be stored. There are several options available( see docs). We will use MySQL for the database. I assume you have MySQL installed. If not, installing it is pretty straightforward(google it :P ). Now make sure your cayley.yml file looks like this: (replace <your_root_password> with your actual password for MySQL root and replace <your_database_name> with the name of your database.
cayley.yml:
store:
# backend to use
backend: "mysql"
# address or path for the database
address: "root:<your_root_password>#@tcp(localhost:3306)/<your_database_name>"
# open database in read-only mode
read_only: false
# backend-specific options
options:
nosync: false
query:
timeout: 30s
load:
ignore_duplicates: false
ignore_missing: false
batch: 10000
now let’s start our graph I am using windows so exact commands might differ from Mac and Linux. You can see (this file):
Open command prompt/terminal and go to the cayley root directory and type this in terminal:
cayley init
Cayley will automatically detect the configurations in cayley.yml file and setup a database. now to load a graph database, we need to know about something called a “schema”.
A schema is a specific way of representation of an information. For example, JSON (JavaScript Object Notation) is an example of a schema. For more information on schemas go to Schema.org’s website. Here for our purpose we will use a schema called “N-quads”. More information about N-quad is here.
In the above example N-quad file, we have a schema of something like this: <person> <follows> <person> <status> . This means that the two <person> are the nodes of a graph and <follows> is the “directional relation” between them. <status> is optional and describes someting more about the relationship.
Now, next step is to load this to our MySQL database. To do so run:
cayley load -i <path_to_nquads_file>
replace the <path_to_nquads_file> with the relative path to your N-quads file. Making an N-quad file is easy. Just write in the N-quad schema and save it with a “.nq” extension.
After loading the graph to cayley you can run an web instance to interact and visualize with your KG using the gizmo query language(which is not that hard to pick up). Run:
cayley http
Go to localhost:64210 on your browser and you can see something like this:
you can type any query here to interact with your KG. An example query is:
g.V().All()
this means get all vertices of the graph object “g”. More about the query language can be found here.
You can also visualize the graph in the web app. Read the docs to be able to do so.
Now comes the interesting (which no one told you) part:
We have implemented a simple Knowledge Graph using Cayley graph database and MySql. Can we interact with this graph remotely without using the web app? Yes! Cayley exposes the graph to the API endpoint : http://127.0.0.1:64210/api/v1/query/gizmo
We can use the python “requests” library to make POST requests and query the Graph wherever we want (Cayley should be running in the background to provide the API endpoint).
import requests
import jsonquery = "g.V().All()"
endpoint = "http://127.0.0.1:64210/api/v1/query/gizmo"
response = requests.post(endpoint, data=query)#the response is a JSON object
json_response = response.json()
print(json_response)
Run the above code in Jupyter Notebook and you should be able to see the JSON response of the API and play around with it. Phew!
Thats all folks! Hope you enjoyed this.