Get started with InfluxDB and Python on Linux

If you are building a script, application or anything else that collects measurements or data over time intervals, a time-series database might be a good match for you. In a TSDB you simply store the data and read it back for analysis. For these use cases, there is no need for updates or changes to the data as it’s time-based. The data can’t change back in the past. InfluxDB is the most popular of these time-series databases and this article describes how to get started with it, on Debian or Ubuntu, and how to use it from Python.

Introduction

You might ask yourself why we need this special type of databases as a regular relational DB (RDBMS) can store timestamped data as well. Although that is definitely true, a TSDB is better at this kind of work. It’s designed to handle this large amount of ever increasing data and to efficiently handle storage and queries to it. On top of that, a time-series DB often makes it easier to process the data as well. For example, InfluxDB has built in features like aggregation, continuous queries, moving averages, filtering data and many more that make it easier and save you time to work with these large datasets.

There are many use cases which are a good match with InfluxDB or a TSDB in general. Typically, for every use case where you collect data on an interval and store it, like log collection, sensor data, stock market data, predictive maintenance, tracking, …, a time-series database will do well.

Covering everything a TSDB has to offer is not really the goal of this article. Here, I’d like to cover the basics in order to get started with InfluxDB in combination with Python. The post will describe how to install InfluxDB, create a database and user and then how we can write data to it and read it back using Python under Debian or Ubuntu Linux.

The starting point of this article is a minimal Debian 10 installation to make this as easy as possible to replicate if you want to try this yourself. To make sure, I’ve tested the same commands and steps on a fresh Ubuntu 20.04 and 20.10 installation as well.

Debian 10 still comes with InfluxDB 1.6. For 2.0 and above releases, the most noticeable is the bundled Web-UI to easily have an overview and insight in the data stored in the DB. Although it works pretty well, it’s not so relevant for this article as we plan to access the data through CLI and Python. If you would want such feature, I’d recommend going for Grafana.

Installation

Since InfluxDB is part of the standard repositories in Debian and Ubuntu, it’s quite straightforward to install it, using the package manager:

jensd@deb10:~$ sudo apt install influxdb influxdb-client python3-influxdb
Reading package lists... Done
Building dependency tree
...
Processing triggers for systemd (241-7~deb10u6) ...

Next to the InfluxDB service itself, we are also installing a client and the required Python libraries for it.

Once installed, we need to start the systemd service and can enable it to start with the system if desired:

jensd@deb10:~$ sudo systemctl enable influxdb
Synchronizing state of influxdb.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable influxdb
jensd@deb10:~$ sudo systemctl start influxdb

That’s all it takes to install and start InfluxDB.
In order to check if this went well, we can use the client to execute a basic query:

jensd@deb10:~$ influx
Connected to http://localhost:8086 version 1.6.4
InfluxDB shell version: 1.6.4
> SHOW DATABASES;
name: databases
name
----
_internal 

There is not much going on in InfluxDB yet but at least we see a connection and a response from the server, so we know for sure it is up and running.

Authentication

By default, and as you could see in the small test we just did, authentication is not enabled. This means that any access attempt, without username and password, will just be granted. This is easy for testing but not really recommended when using this in a real/production setup.

To enable authentication, we first need to create an admin user that has all privileges so we at least can log in once we make authentication mandatory:

jensd@deb10:~$ influx
Connected to http://localhost:8086 version 1.6.4
InfluxDB shell version: 1.6.4
> CREATE USER admin WITH PASSWORD 'secretpassword' WITH ALL PRIVILEGES;
> SHOW USERS;
user  admin
----  -----
admin true 

Once this is done, we can edit /etc/influxdb/influxdb.conf and set auth-enabled to true:

jensd@deb10:~$ sudo vi /etc/influxdb/influxdb.conf
jensd@deb10:~$ cat /etc/influxdb/influxdb.conf | grep auth-enabled
 auth-enabled = true

After the changes, you need to reload/restart influxDB to apply the changes to authentication.

jensd@deb10:~$ sudo systemctl restart influxdb

Now, when trying to run the unauthenticated request again that we ran during the installation process. We can see it is no longer accepted to log in without specifying a username/password combination and execute queries:

jensd@deb10:~$ influx
Connected to http://localhost:8086 version 1.6.4
InfluxDB shell version: 1.6.4
> SHOW DATABASES;
ERR: unable to parse authentication credentials
Warning: It is possible this error is due to not setting a database.
Please set a database with the command "use ". 

When specifying the user and password we created one step earlier, we see that, when using authentication, all works as epxected:

jensd@deb10:~$ influx -username admin -password secretpassword
Connected to http://localhost:8086 version 1.6.4
InfluxDB shell version: 1.6.4
> SHOW DATABASES;
name: databases
name
----
_internal 

If you need additional security, you can also limit the bind address to a specific interface, in the same file as above. InfluxDB is listening on TCP port 8086 by default, so if you have a firewall, you need to keep that in mind.

Databases, measurements, series, tags and fields

In order to continue working and playing around with InfluxDB in a proper way, it’s unavoidable to look into a bit of theory and concepts.

I’ll try to keep this as short and to the point as possible but it’s important to understand this well if you want to make the best out of everything InfluxDB has to offer.

To make things clear, I’ll work with some sample data:

TimePOWER_INPower_OUTSENSOR
2021-03-14T22:25:29Z13323motor1
2021-03-14T22:25:42Z15678745motor2
2021-03-14T22:25:59Z13733motor1
2021-03-14T22:26:12Z16439652motor2
2021-03-14T22:26:29Z13427motor1
2021-03-14T22:26:37Z9832653psu1
power_info

As you can see, this data looks very similar, if not the same, as you would see with a classic relational DB. We can see a table with several columns and rows which are part of a database with multiple of these tables.

Fields and tags

Columns: power_in, power_out and sensor are either fields or tags. Fields or tags are a combination of a key (the header) and a value (the actual data in the column). The distinction between field and tag, although they look the same from looking at the sample data table, is related to indexing and performance. A tag is indexed and a field is not. So if you need to filter or refer to data often based on a certain column, it is better to define it as a tag and the other way around.

For example, if we often need the values for power_in per sensor, it would make sense to define power_in as a field key with field values 133,1567, 137,… and sensor as a tag key with tag values motor1, motor 2 and psu1.

This would optimize performance as query would probably look something like: SELECT “power_in” FROM “power_info” WHERE “sensor” = “motor1”.

Measurements

As you see in the above query, I referred to the table’s name: power_info in the query. With InfluxDB, it’s no longer called a table but a measurement. It’s the combination of the timestamp, fields and tags together.

The first column for a measurement, time, which contains a timestamp for the rest of the fields and tags is present in each table. It’s the base of a time-series database.

Points and series

Rows in the table are called points. A point represents a single row in the table.

Series, which is the last concept I will go into, is a combination of a measurement, tags and field key. If we look a the sample data, we could have the following series:

  • Measurement: power_info – Tag set: motor1 = Field key: power_in
  • Measurement: power_info – Tag set: motor2 = Field key: power_in
  • Measurement: power_info – Tag set: psu1 = Field key: power_in
  • Measurement: power_info – Tag set: motor1 = Field key: power_out
  • Measurement: power_info – Tag set: motor2 = Field key: power_out
  • Measurement: power_info – Tag set: psu1 = Field key: power_out

As you can see, especially if you already have some experience with a classic relational DB, this is not too difficult to grasp. It should be quite straightforward to apply these concepts once you start to use your InfluxDB instance.

Using InfluxDB

Preparation

Now that you are familiar with the concepts, let’s start to use our InfluxDB installation. Before we can get started, we need to make sure we have a database that will contain our measurements and can access it in a secure way.

For the rest of the article, I will use the same example which I used to explain the concepts.

Let’s start by creating a database. I will call it energy.

jensd@deb10:~$ influx -username admin -password secretpassword
Connected to http://localhost:8086 version 1.6.4
InfluxDB shell version: 1.6.4
> CREATE DATABASE energy; 

Once created, we can list the existing databases to see if it worked.

> SHOW DATABASES;
name: databases
name
----
_internal
energy
>

We could access this database with our admin user but as it’s probably not a good idea to use that account with all privileges, it’s a good idea to create a dedicated user, that has access to only that database only.

> CREATE USER "energyuser" WITH PASSWORD 'verysecret';
> SHOW USERS;
user       admin
----       -----
admin      true
energyuser false
> GRANT ALL ON energy TO energyuser; 

The first command: create user, creates a regular, non-admin user. The last command sets the read and write = all permissions for this user on the newly created database energy.

Line protocol

Writing data or points to InfluxDB typically happens using the line protocol. Unlike a “regular” RDBMS, it’s not needed to create a SQL(like) query for that.

The line protocol syntax looks as follows:

measurement,(tag_set) field_set (timestamp)

Or with an example that matches the explanation above:

power_info,sensor=motor1 power_in=145,power_out=568

The tag_set(s) are optional and so is the timestamp as in most cases you want InfluxDB to handle that for you. Between the tag_set and field_set there is a whitespace and same goes between the field_set and timestamp.

If you do not use any tags, the line would look as follows:

power_info power_in=145,power_out=568 

Write and read data with the InfluxDB client

Now that we have our database created and we have a user that is able to access it, let’s test and see if we can write some data to it.

Staying with the sample data I presented while explaining the concepts, let’s write some power data to measurement power_info. We will use power_in and power_out as fields and sensor as a tag.

It’s not common to use the influx client to write data to the database in scripting but it can be used for importing large bulks of data, or to load JSON, CSV or regular text files, containing lines with data.

First, let’s insert a single point in the database using the example line from the explanation above:

> USE energy;
Using database energy
> INSERT power_info,sensor=motor1 power_in=145,power_out=568 

Now to access the data which we wrote, we can use a SQL-like query:

> SELECT * FROM power_info;
name: power_info
time                power_in power_out sensor
----                -------- --------- ------
1615996777690037686 145      568       motor1 

You can see how this simplifies things a lot. We did not need to specify a timestamp, InfluxDB took care of that and, even more special, we did not have to predefine a structure for the measurement. It got created while we executed the query, on the spot.

To show the existing structures in the database, which is really handy if you’re exploring an unknown database or you forgot how things look like, you can use the following:

> SHOW MEASUREMENTS;
name: measurements
name
power_info
> SHOW FIELD KEYS;
name: power_info
fieldKey  fieldType
--------  ---------
power_in  float
power_out float
> SHOW TAG KEYS;
name: power_info
tagKey
sensor 

Writing and read using Python

At the beginning of the article, together with InfluxDB itself and the client, we also installed the required Python 3 libraries to work with InfluxDB databases.

As mentioned, in a real use case, you won’t insert data into InfluxDB using the client. You would probably write to it from a script or using the HTTP API. As I was looking to do this from Python, you can find that method here.

A simple Python-script to write some data to InfluxDB using python looks like the following:

#!/usr/bin/python3
from influxdb import InfluxDBClient
client = InfluxDBClient(host='localhost',
                        port=8086,
                        username='energyuser',
                        password='verysecret')
line = 'power_info,sensor=motor1 power_in=123,power_out=348'
client.write([line], {'db': 'energy'}, 204, 'line')
client.close()

You might wonder what the 204 means. That is the expected return code from InfluxDB for a write.

After executing this piece of code, we can see that another point has been added to our measurement:

> SELECT * FROM power_info;
name: power_info
time                power_in power_out sensor
----                -------- --------- ------
1615996777690037686 145      568       motor1
1615997811600448629 123      348       motor1 

Now, in order to read the data back using Python, we can use the following code:

#!/usr/bin/python3
from influxdb import InfluxDBClient
client = InfluxDBClient(host='localhost',
                        port=8086,
                        username='energyuser',
                        password='verysecret',
                        database='energy')
result = client.query('SELECT * FROM "power_info"')
points = result.get_points(tags={'sensor': 'motor1'})
for point in points:
    print("Time: %s, power_in: %i, power_out: %i" \
          % (point['time'], point['power_in'], point['power_out']))
client.close()

The result of executing the code, looks as follows:

jensd@deb10:~$ ./test.py
Time: 2021-03-17T15:59:37.690037686Z, power_in: 145, power_out: 568
Time: 2021-03-17T16:16:51.600448629Z, power_in: 123, power_out: 348

That should be enough to get you started. There is a lot more possibilities to query and write to InfluxDB but hopefully this basic information will help you to get results quickly :)

Leave a Reply

Your email address will not be published. Required fields are marked *