Back to blog results

6월 6, 2024 By Chas Clawson

Securing open source infrastructure – Log all the things

Secure OSS telemetry and loggin

The last time we wrote about open source software (OSS) for security, we explored how community-driven innovation addresses security problems stemming from the rapid pace of business-driven technological advancements. We posed the question: Can open source security solutions adequately secure and protect the OSS that modern businesses depend on? As we race into an “everything-as-code” era built on OSS, we must consider if the community can provide the necessary telemetry to maintain stability and security.

The telemetry tsunami: riding the wave with M.E.L.T.

Let’s delve into how OSS collection technology, particularly metrics, events, logs, and traces (M.E.L.T), is keeping up with the relentless digital exhaust of today's systems and applications. To dig deeper into the topic, I reached out to Peter Czanik, an expert in open-source logging, especially with syslog-ng. Since the late 90s, syslog-ng has been trailblazing in the log aggregation space, and it shows no signs of slowing down.

Ditching the proprietary for OSS: A new era in logging

It's fair to say the industry has shifted away from proprietary logging agents. The Cloud Native Computing Foundation (CNCF) has incubated key technologies like FluentD, which features a pluggable architecture that unifies data collection and consumption, and OpenTelemetry, formed from the merger of OpenTracing and OpenCensus projects. Yet, long before these cloud-centric solutions existed, there was syslog-ng. We decided to chat with Peter to learn more about their latest releases and ongoing innovations.

Many may not know that syslog-ng natively supports sending logs to modern SaaS analytic solutions like Sumo Logic. These prebuilt “destinations” can use either the network() or the http() destination, simplifying configuration for users. Sending logs directly to a cloud-hosted syslog aggregator allows customers to eliminate expensive, load-balanced log collection servers, leveraging scalable cloud solutions instead. Thanks to TLS encryption and tokens, sending logs securely through the cloud is straightforward. Furthermore, building your collection strategy on syslog-ng versus a proprietary agent avoids vendor lock-in and keeps options open for future integrations.

The latest releases of syslog-ng are particularly exciting, featuring the opentelemetry() source and destination, which can handle logs, traces, and metrics using OTLP/gRPC. The OpenTelemetry Line Protocol (OTLP) transmits telemetry data efficiently between distributed system components. gRPC is a high-performance open-source RPC framework developed by Google, uses HTTP/2 for transport, Protocol Buffers for interface definition, and offers features such as authentication and load balancing.

Q&A with Peter Czanik, Open Source Evangelist at One Identity

What trends are you seeing in the OSS world regarding data collection, logging and telemetry?

Answers by Peter Czanik:

If I wanted to summarize trends in a single word, I’d say: simplification. For many years I have seen that one tool collects and forwards system logs, another tool from an application, a third tool a bit from everything, and so on. While this method is easy to configure at the beginning, it is very expensive to maintain in the long run.

  • Multiple log management tools to keep up-to-date and push through security and functional checks on each update.

  • Running multiple logging tools in parallel needs more computing resources, creating additional overhead in virtualization and hardware maintenance

  • Forwarding the same logs multiple times over the network is expensive, especially in a cloud environment.

The most important use case for syslog-ng in the past years was simplification: using a single, dedicated central log management layer. A single tool for collecting all log messages to a central location, where log messages are often saved for long-term storage (compliance), and forwarded to multiple destinations for further analytics after some minimal processing. Logs travel only once, and each destination receives only the logs they really need.

The next step of simplification is OpenTelemetry, when not only logs, but also metrics and traces are collected. As far as I can see, the main driver behind Otel seems to be Kubernetes. However, I also received requests for OpenTelemetry support from the FreeBSD world, where Kubernetes does not exist. I’m very happy to share that OpenTelemetry support on FreeBSD already works in development snapshots of syslog-ng Open Source Edition.

Should we expect to see winners slowly emerge and become more dominant? Or should we expect purpose-built OSS solutions to remain viable as they more easily solve unique problems?

We already have some dominant players. Rsyslog is usually the default syslog implementation on Linux systems, because it is relatively easy to configure for simple use cases. Fluentd is a CNCF-graduated application on Kubernetes, supporting many cloud technologies as it is very easy to extend.

But being dominant does not mean that there is no space left for other players. Syslog-ng is just one of the alternatives. Syslog-ng usually needs a bit more investment (time and effort) at implementation time, but it’s a more flexible tool that allows to build complex logging configurations. And while syslog-ng does not have a connector for each and every cloud service, it works really fast with a small footprint, which is very important when you have to process and forward billions of log messages a day. I expect to see multiple players in the long run. I’m proud that even if syslog-ng is not a majority player, it is dominant in a niche: where logging is not a compliance checkbox but a core technology.

What are the most common destinations you see for syslog-ng these days?

There is no easy answer for this, as the syslog-ng project does not collect usage information in any form – so the straight answer is that we don’t know. The only time we see a user’s configuration is when there is a problem report. However, I follow syslog-ng related discussions all over the Internet (thanks to Google Alerts) and talk to many users in person at various events.

The file destination is probably still the most popular. It’s not just saving logs locally to text files, but also on the central server. Some compliance regulations require you to store logs for years. Storing those in a database or a SIEM is very expensive, so for a lot of organizations the file store is still very attractive. Different subsets of log messages are usually sent to different applications for analytics. However, logs are stored there only for the minimal required time. Long term log storage is usually solved by using text files on cold storage.

The syslog protocol is used between syslog-ng instances, so that is probably still the second most popular destination.

Most people do not even know, but they use the http() destination under the hood. It is extremely fast, as it allows not just the batching of log messages, but also multiple worker threads. Load-balancing among multiple nodes is also possible. This means that the http() destination can send as many logs as previously only a load-balancer cluster could do. Almost all applications or services reachable using the http() destination use a different API. Luckily you do not have to be aware of this, as configurations in SCL (syslog-ng configuration library) hide away the complexity of the implementation details. The elasticsearch-http() and the splunk-hec-event() drivers are probably the most popular; however, the sumologic-http() driver is also widely used. All of these are built on the http() destination and part of the SCL.

Can you walk us through a few technical examples of deploying syslog-ng in an architecture that uses OTel and a cloud analytics solution like Sumo Logic?

There are many ways you can send data to Sumo Logic. Log data can be sent at least three different ways. This Sumo Logic documentation shows you a syslog-ng configuration snippet, which forwards log messages using the RFC 5424 syslog protocol. You can see all the configuration details and must edit the configuration yourself. It is not the easiest way, but you still should bookmark this page: how to configure TLS for Sumo Logic.

Recent (version 3.27.1 or later) syslog-ng versions hide away the complexity of configuration from the user. There are two drivers defined in SCL. One is sumologic-syslog(), which is practically the same as the example in the Sumo Logic documentation, just a bit easier to configure. You can learn more about it from the syslog-ng documentation.

The more interesting is the sumologic-http() driver. Instead of using the syslog protocol, it is built on top of the http() destination. The initial configuration is really easy:

sumologic-http(collector("UNIQUE-HTTP-COLLECTOR-CODE-AS-PROVIDED-BY-sumologic") deployment("ENDPOINT")
	tls(peer-verify(yes) ca-dir('/etc/syslog-ng/ca.d')));

The following configuration, when appended to syslog-ng.conf on a Debian system, sends all local log messages to the EU Sumo Logic endpoint. The collector code is shortened here, so it’s both anonymized and fits the line.

destination d_sumo {
	sumologic-http(collector("ZaVnC4dhaV3_8NoU4...RuyE5z4A==") deployment("eu")
    	tls(peer-verify(yes) ca-dir('/etc/syslog-ng/ca.d'))
	);
};

log {
  source(s_src);
  destination(d_sumo);
};

This configuration works perfectly well if you have to forward a tens of thousands of events per second. If you have hundreds of thousands of events, you can still use a single syslog-ng instance to forward log messages; however, it needs some fine-tuning. You can use most options of the http() destination to tune syslog-ng, like enabling multiple worker threads (the workers() option), batching (the batch-lines() option) or compression (the content-compression("gzip") option) to save bandwidth. If you have an unreliable network connection, then you should also use disk-based buffering (the disk-buffer() option).

If you want to collect not just logs but also metrics and traces, OpenTelemetry combines these three into a single protocol. Version 4.3 of syslog-ng added initial support for OpenTelemetry, and each new version improved on this ever since. Syslog-ng can collect data over OTLP (OpenTelemetry Protocol), parse the incoming data, and send data over OTLP. The dependencies necessary to compile syslog-ng OpenTelemetry support are only available in recent Linux distributions. Syslog-ng is mostly run on RHEL and compatible distributions. From those only version 9 is supported. It is also available on Fedora and most rolling Linux distros, and from the next syslog-ng version FreeBSD is also available.

To send data to Sumo Logic’s OpenTelemetry collector, install it on the host running your central syslog-ng server using an installation token. Next configure syslog-ng:

source s_otel {
  opentelemetry(port(14317));
};
destination d_sumo_otel {
  opentelemetry(
	url("127.0.0.1:4317")
  );
};
log {
  source(s_sys);
  source(s_otel);
  destination(d_sumo_otel);
};

This configuration collects data using OTLP on port 14317, and forwards collected data together with local system logs to the Sumo Logic OpenTelemetry Collector running on the localhost.

The above configuration is good enough for testing. In a production configuration, you will most likely add parsers and filters, depending on your environment and requirements.

The community can’t do this alone

As we increasingly rely on open-source software for enterprise solutions, it's imperative that vendors actively contribute to these projects' security. Log collection technologies like syslog-ng and OpenTelemetry need continuous innovation, and for-profit companies need to support this effort. This investment is crucial, as the success and security of OSS are critical.

Start free with Sumo Logic today and see how easy it is to get started thanks to OSS innovations like OTel.

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Sumo Logic cloud-native SaaS analytics

Build, run, and secure modern applications and cloud infrastructures.

Start free trial
Chas Clawson

Chas Clawson

Field CTO, Security

As a technologist interested in disruptive cloud technologies, Chas joined Sumo Logic's Cyber Security team with over 15 years in the field, consulting with many federal agencies on how to secure modern workloads. In the federal space, he spent time as an architect designing the Department of Commerce ESOC SIEM solution. He also worked at the NSA as a civilian conducting Red Team assessments and within the office of compliance and policy. Commercially, he has worked with MSSP practices and security consulting services for various fortune 500 companies. Chas also enjoys teaching Networking & Cyber Security courses as a Professor at the University of Maryland Global College.

More posts by Chas Clawson.

People who read this also enjoyed