Head In Cloud BVBA

Consul ports and what they are used for

Fri, 10 Feb 2017 10:35:00 +0000

Regarding firewalls, this depends on your particular implementation. On a Consul server, you probably want to allow communications on all the ports mentioned above.

On a Consul agent, things get more tricky. Port 8301 needs to be open, as this is required for communication with other agents and servers. Ports 8400, 8500, 8600 depend on your use-case. If you install a consul agent on every node, there is no need to open those ports in the host firewall. Your applications can just use 127.0.0.1 to communicate with the API and DNS interface.

When SysOps need workflow.... Introducing Apache NiFi.

Fri, 29 Jan 2016 22:57:00 +0000

What is Apache NiFi?

Apache NiFi is a dataflow tool that is quickly becoming quite popular in the Big Data world. According to the website, NiFi is:

...an easy to use, powerful, and reliable system to process and distribute data.

I think the Apache NiFi guys are being a bit too modest here :-) The way I would describe NiFi is:

Apache NiFi is a web-based tool that allows you to get data from almost any source, and transform/route it to almost any destination using an intuitive WYSIWYG workflow designer.

At the moment, you can receive/send data from/to the following data sources:

local files
HTTP/HTTPS (very handy if you want to integrate with cloud-based services like PagerDuty, HipChat, Slack, Twilio)
Syslog
S3
Twitter
FTP
SQS
Apache Kafka
Probably a lot more... :-)

How can Apache NiFi help in System Operations?

As a system operator, you probably deal with a lot of data already that needs to be processed and evaluated. Over the years, you probably developed your own solutions to deal with this data. Did you ever create scripts for one or more of these tasks:

Post an alert to a website when a system goes down?
Ship log files to another system for further analysis (via FTP, or to S3)?
Send an SMS when something happens that's not supposed to happen?

If the answer is yes to any of the questions, then NiFi might be an asset for your IT environment. True, writing your own scripts to solve those issues can give you a high sense of satisfaction, but the most important issue with this approach is this:

System operators should be focused on the data your environment generates, and not the code that processes that data.

Okay, some readers are probably rolling their eyes right now, but allow me to elaborate. First, let me ask you a few questions about the integration-scripts you developed yourself:

Can your script handle a network loss when it is in the middle of processing data?
Does it scale up to multiple threads?
How well does it perform when it suddenly needs to process more data (like 10x) compared to the usual load?
Do you have a central dashboard that shows the data flow happening in your scripts?

As someone in system operations, you probably don't want to deal with all the "details" mentioned above, you just want to get your data, transform it to what you want it to be, and send it to where it needs to go.

Maybe you have a team of coders that can handle those issues mentioned above, but they are probably busy developing your company's product, and probably don't have the resources either to assist you every time. You might consider a proprietary solution, but most of the time you will be stuck with what the vendor offers. You want tools that adapt to your workflow, not the other way around. Apache NiFi is free, and allows you to create any workflow you want, with any data you want.

How does Apache NiFi compare to an ELK stack?

If you are already using an ElasticSearch-LogStash-Kibana (ELK) stack, you might wonder how Apache NiFi fits in. In my opinion, they are two different systems that complement each other:

ELK is great for historical analysis of your data.
Apache NiFi is great of realtime processing of your data.

Example 1: Building a Syslog server.

I admit, I'm a big fan of ChatOps. Having a chat-room as the primary hub of communication for your operations team encourages teamwork, and makes it a lot easier to work with remote teams in different time zones as they have access to all the conversations that happened when they were still asleep :-)

One of the things I wanted, was a chat-room that acts as a live-feed of all the syslog messages generated by my servers. This is the first workflow I built in NiFi, and I was surprised I had everything up and running in less than 3 hours. Mind you, I had zero experience with NiFi when I built this, so I still needed to get the hang of it. If I had to develop this in a programming language I had no prior experience with, I think it would have taken longer than 3 hours.

I use HipChat for team chatrooms, so I need to format the data to something that HipChat expects, before posting it to the API HTTP server.

Here is what I ended up with:

Take a good look at the picture. Even without any NiFi experience, it's quite easy to figure out what's going on:

NiFi starts a syslog listener.
Some attributes are added which are required for HipChat formatting.
If it's an error, we add another attribute that will cause the message to be displayed in red. If not, it is displayed in green.
The last steps just transform the data to JSON, add the correct MIME type, and do a HTTP POST to the HipChat API server.

The only thing left to do, was to reconfigure my servers so syslog messages get forwarded to my NiFi server.

The output as shown in HipChat:

The formatting could be improved, but it ain't bad for a first attempt :-)

Example 2: Building a HTTP to FTP gateway

Here is another example that shows how you can easily build a HTTP-to-FTP gateway with NiFi:

Once again, the flow is quite easy to follow:

NiFi listens for HTTP requests. Files can be uploaded via a HTTP POST request.
NiFi uploads the file to the FTP server and sends out an e-mail about the successful upload.
If the FTP transfer fails, the file is stored locally for further inspection, and an e-mail is sent out to notify the administrators.

Time to implement: 30 minutes more or less. Once again, no coding required.

Batch or real-time? Single-threaded or multi-threaded?

So, is NiFi optimized for real-time processing or batch-processing? The answer is simple: it depends on how you configure it. Every box in the diagram is called a "processor", and its throughput can be configured and tuned to your own wishes:

Conclusion.

I believe that Apache NiFi is a valuable asset to manage the data flow of your IT environment. I have a simple test to determine if a tool is worthwhile to me or not: if I can come up with more than 3 scenarios where this particular tool can help me, I consider it a winner. Apache NiFi beats that test without any doubt.

While the examples shown here are quite simple, it can handle very complex workflows, allows flows to be arranged in different process groups, and NiFi server also supports clustering.

Additional information.

I only scratched the surface of what Apache NiFi can do. There is a great introduction video from OSCON 2015, given by Joe Witt of HortonWorks. I recommend you check it out.

Using DynamoDB as a Django settings store

Fri, 27 Nov 2015 15:00:00 +0000

[TOC]

Django settings overview

When you don't specify a settings-module for your Django project, the settings.py which is located in your project folder will be used. You can override the settings module in two ways:

Via the command line using the --settings= parameter.
Via the DJANGO_SETTINGS_MODULE environment variable.

In the past, I used a separate settings module for each environment, which resulted in multiple settings modules in my codebase:

/project/settings.py (for local development)
/project/settings_test.py (for test environment)
/project/settings_prod.py (for production environment)

I think most people who started developing with Django did it this way initially, however, there are a few drawbacks to this technique:

Disclosure of sensitive information: Keeping settings in your codebase directly, means you also have sensitive information like database usernames and passwords in your codebase.
Subtle changes in test vs production: You add a parameter in your test environment settings module, but you forget to add the parameter in your production settings module.
Changing settings requires deployment: This one speaks for itself. Changing settings should not require a new deployment of your application.

Test vs. production state

While settings on your local development machine can differ from your production environment (after all, during development we experiment with new things), for actual deployment, we want our test environment to match our production environment as close as possible. In order to make this possible, we need the following three conditions:

We need to dynamically determine our environment during application startup.
Based on the information we got in the previous step, load the configuration associated with the environment we are currently running in.
Our settings loading mechanism should be identical in both test and production.

Let's bring in AWS

Let's see how we can establish those three steps, with a little help from AWS :-)

Determine our environment

Since we run our application on EC2 instances, we can use tags to identify our environment. For example, for every instance we launch in our test environment, we can add the following tags:

Environment: test-myapp01 Environment-role: test-myapp01-website

The Environment-role tag is added to quickly identify EC2 instances when your application consists of multiple components. For example, you might also have a role called test-myapp01-mailgateway if your application sends out email and the mail server is on a different instance.

If you use tools like CloudFormation or TerraForm (and I really recommend you do), you can have those tags added automatically every time you make a change to your infrastructure.

During startup of our application, we can determine our environment by querying the meta-data of the instance we are running on.

Loading the configuration associated with our environment

Since our environment is now identified, we can easily load our configuration. I choose DynamoDB as the repository for the application settings, since it's highly-available in your AWS region, it's cheap, and you can manage it via the AWS console.

Unifying our settings loader

In this setup I only have two settings modules in my codebase:

/project/settings.py (for local development)
/project/settings_deploy.py (for test and production environment)

settings_deploy.py will retrieve the EC2 tags associated with the instance it is running on, and retrieve the settings from the DynamoDB table.

Implementation

DISCLAIMER: This is just a proof-of-concept, and not production-quality code.

Creating the DynamoDB table

The name of the table should be related to the environment-role we run in. For example, if our environment-role is test-myapp-website, we need to create a DynamoDB table that is called test-myapp-website-config.

We will use the AWS command-line tools to do this:

aws dynamodb create-table --table-name=test-myapp-website-config \ --attribute-definitions AttributeName=Parameter,AttributeType=S \ --key-schema AttributeName=Parameter,KeyType=HASH \ --provisioned-throughput ReadCapacityUnits=1,WriteCapacityUnits=1 Next, we will fill this table with some default settings. Create the file test-myapp-website-config.json with the following content:

{ "test-myapp-website-config": [ { "PutRequest": { "Item": { "Parameter": {"S": "debug"}, "Value": {"BOOL": true} } } }, { "PutRequest": { "Item": { "Parameter": {"S": "db_host"}, "Value": {"S": "my.test.server"} } } }, { "PutRequest": { "Item": { "Parameter": {"S": "db_name"}, "Value": {"S": "my_db"} } } }, { "PutRequest": { "Item": { "Parameter": {"S": "db_user"}, "Value": {"S": "my_username"} } } }, { "PutRequest": { "Item": { "Parameter": {"S": "db_pass"}, "Value": {"S": "my_password"} } } }, { "PutRequest": { "Item": { "Parameter": {"S": "db_port"}, "Value": {"S": "5432"} } } } ] }

Next step, load this file into your DynamoDB table:

aws dynamodb batch-write-item --request-items file://test-myapp-website-config.json

Loading our settings from Django

Make sure you have Boto and Requests installed: pip install requests pip install boto

At the top of our settings_deploy.py file, we can add the following code to retrieve the value of our Environment-role tag:

```

get environment

r = requests.get('http://169.254.169.254/latest/meta-data/instance-id') if r.statuscode == requests.codes.ok: instanceid = r.text conn = ec2.connecttoregion(AWSREGIONNAME) reservations = conn.getallinstances() for res in reservations: for inst in res.instances: if inst.dict['id'] == instanceid: AWSENV = inst.dict['tags']['Environment-role']

```

Now we can construct the name of our table, and connect to it:

dynamo_conn = dynamodb.connect_to_region(AWS_REGION_NAME) config_table = dynamo_conn.get_table('{}-config'.format(AWS_ENV))

After we connected to the table, we can retrieve our settings:

``` DATABASES = { 'default': { 'ENGINE': 'django.db.backends.postgresqlpsycopg2', 'NAME': configtable.getitem(hashkey='dbname')['Value'], 'USER': configtable.getitem(hashkey='dbuser')['Value'], 'PASSWORD': configtable.getitem(hashkey='dbpass')['Value'], 'HOST': configtable.getitem(hashkey='dbhost')['Value'], 'PORT': configtable.getitem(hashkey='db_port')['Value'], } }

```

IAM access role

Our instances need some additional permissions, to read the EC2 tags, and read the DynamoDB table. Add the following to your instance's IAM role:

``` { "Effect": "Allow", "Action": [ "ec2:DescribeInstances", "ec2:DescribeTags" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "dynamodb:DescribeTable", "dynamodb:GetItem", "dynamodb:BatchGetItem" ], "Resource": [ "" ] }

```

Repeat for your production environment

You can now create a similar table for your production environment, and tag your production instances in the same way.

Final words

This is just a quick example, and you might want to do some extra work before you start implementing this:

Provide error checking to make sure the table and values exist in DynamoDB.
Use BatchGetItem to retrieve all settings in one go.

Also take a look at Dynamodb-config-store: https://github.com/sebdah/dynamodb-config-store.

References

The Twelve-Factor App: http://12factor.net
How to manage production/staging/dev Django settings: https://discussion.heroku.com/t/how-to-manage-production-staging-dev-django-settings/21
An Introduction to boto’s DynamoDB interface: http://boto.readthedocs.org/en/2.3.0/dynamodb_tut.html

New website launched.

Wed, 11 Nov 2015 22:00:00 +0000

We are working on some articles about Django web development and deployment on AWS, so stay tuned.