Nothing is more frustrating than a system being broken with no visibility into it, at the same time there is nothing more satisfying than catching problems with your vendors before they do. Being proactive, rather than reactive also buys a ton of grace from your users when you catch problems before they are reported and sometime having gone totally unnoticed.
This reminds me of a quote from the Futurama Godfellas episode where the Galaxy God Entity is talking to Bender and says “When you do things right, people won’t be sure you’ve done anything at all” and I couldn’t agree more. Proper logging and visibility will lead to exactly that!
Below we will look at how I have implemented it and the choices made along the way.
The Software Stack
Grafana
This is the user interface to query logs and metrics, creating rich dashboards and visualizations, and alerting when specific conditions are met.
Loki
This is a platform specifically optimized for handling message streams. I prefer to instrument my applications using JSON messages that Grafana can then parse out with LogQL. Become familiar with writing log queries as that is how you will explore the data.
Log streams are grouped by labels. These labels have served me quite well for many years.
- Application is the top level assembly or application that produced the log
- Environment is the environment (e.g. dev, stage, production)
- MachineName is the computer or server running the application
WARNINGLoki will group and store in like chunks (files) on the server using the stream labels. You want this to be as low cardinality as possible to avoid having too many individual files on the server.
You might be asking at this point, “How do I add additional information to my log messages”? This is where the Log Property Stack comes in. It is a way to enrich your logs with additional context without trying to jam it into the log message itself.
Log Property Stack
Instead of adding all the details to the log message itself, you should utilize the property stack which will persist the property with every message nested within the scope that property was defined.
Here are some examples of what I use through my applications. These are staples that I include in everything and add more as needed to determine application state at major branching points.
- InstanceId for long running process, defering to CorrelationId for nested units of work
- CorrelationId for discrete units of work
- RequestInfo object for web stuff like url, headers, query params, and other information to help trace requests.
Dependency Recommendations
- Serilog for dotnet applications
- Serilog.Sinks.Debug for web applications when you are running locally
- Serilog.Sinks.Console for console applications writing to stdout/stderr
- Serilog.Sinks.Grafana.Loki for writing to a Loki server
- SerilogTimings for timing operations, use this whenever my application leaves its domain of control (e.g. when making database query or web requests)
- Custom stored procedures for SQL Server, see bonus content below
Prometheus
Prometheus is a time series database and handles things that do not fit into structured logging such as performance counters. Like Loki, has its own query language called PromQL.
Performance counters
- CPU and Memory Utilization
- Disk and Network Usage
- IIS and SQL specific counters like request throughput
These counters can be gathered from the servers directly using a tool called windows_exporter that provides an HTTP endpoint that can be scraped by a Prometheus server.
WARNINGPlease be mindful of your companies security practices as the out of the box the server is not secured. Configuration can be found here.
Logging Fundamentals
Here be dragons…
- Avoid logging PII — Logs are public and can leak information
- Avoid excessive logging — Logging can be great at first to build confidence in a process. If you find yourself logging every little step,
DebugorVerboseshould be used- Avoid logging user correctable errors as
Error.DebugorWarningmay be a more appropriate place for these types of messages- Avoid punctuation when possible. Log messages are fragments, not sentences
Log Key Events
- Major branching points in your code
- When errors or unexpected values are encountered
- Any IO or resource intensive operations
- Significant domain events
- Request failures and retires
- Beginning and end of time-consuming batch operations
Choose Appropriate Logging Level
- Be generous with your logging but strict with your logging levels. In almost all cases the level of your logs should be Debug
- Use Information for log events that would be needed in production to determine the running state or correctness of your application
- Use Warning or Error for unexpected events like exceptions
- The Error level should be reserved for events that you intend to act on. User correctable errors should never be logged at this level
Bonus — Logging in SQL Server
Configure
This is called at the start of your SQL session that you want to log and should only be called once per session.
CREATE PROCEDURE [dbo].[ConfigureFor]( @Application VARCHAR(200), @Environment VARCHAR(11))ASBEGIN BEGIN TRY -- normalize and validate environment value SELECT @Environment = CASE WHEN @Environment IN ('development', '0') THEN 'Development' WHEN @Environment IN ('stage', '2') THEN 'Stage' WHEN @Environment IN ('production', '1') THEN 'Production' ELSE NULL END
IF @Environment IS NULL BEGIN RAISERROR('Invalid environment value. Should be one of [Development|Stage|Production]', 16, 1) END
DECLARE @ReadOnly INT = 1 EXECUTE sp_set_session_context 'Application', @Application, @ReadOnly EXECUTE sp_set_session_context 'MachineName', @@SERVERNAME, @ReadOnly EXECUTE sp_set_session_context 'Environment', @Environment, @ReadOnly
EXECUTE sp_set_session_context '__index__', NULL END TRY BEGIN CATCH THROW END CATCHENDLog Level Functions
The meat and potatoes of the implementation. It will gather all of the log properties off the stack and do the submission to the Loki server. There are convenience functions for each log level.
Implementation
IMPORTANTThis will need a way to make HTTP requests from SQL Server. This could be through OLEAutomation, a dotnet SQLCLR assembly or SQL Server Language Extensions.
NOTEThis is typically not called directly favoring the convenience functions.
CREATE PROCEDURE [dbo].[LogTo_Impl]( @Message NVARCHAR(4000), @Level NVARCHAR(50))ASBEGIN BEGIN TRY --DECLARE @Message NVARCHAR(4000) = 'DONKEY!' -- cheap insurance to make sure that ConfigureFor has been called IF SESSION_CONTEXT(N'Application') IS NULL BEGIN RETURN END
-- get log time as unix epoch in nanos DECLARE @Time BIGINT = DATEDIFF_BIG(NANOSECOND, '1970-01-01 00:00:00.0000000', SYSUTCDATETIME())
-- add standard set of properties to be logged EXECUTE LogProperty_Add 'Message', @Message EXECUTE LogProperty_Add 'level', @Level
-- send to loki -- json -- { -- "streams": [ -- { -- "stream": { -- "label": "value" -- }, -- "values": [ -- [ "unix epoch in nanoseconds", "log line" ], -- [ "unix epoch in nanoseconds", "log line" ] -- ] -- } -- ] -- } -- -- http://localhost:3100/loki/api/v1/push
-- these are configured in the ConfigureFor stored procedure DECLARE @StreamLabels NVARCHAR(MAX) = ( SELECT * FROM ( SELECT * FROM(VALUES ('Application', SESSION_CONTEXT(N'Application')), ('MachineName', SESSION_CONTEXT(N'MachineName')), ('Environment', SESSION_CONTEXT(N'Environment')) ) stream_labels([key], [value]) ) a PIVOT ( MAX([value]) FOR [key] IN ([Application], [MachineName], [Environment]) ) pvt FOR JSON AUTO, WITHOUT_ARRAY_WRAPPER/*, INCLUDE_NULL_VALUES*/ ) --SELECT @StreamLabels
IF OBJECT_ID('tempdb..#TMP') IS NOT NULL BEGIN DROP TABLE #TMP END
SELECT [key] = [value], [value] = SESSION_CONTEXT(CAST([value] AS NVARCHAR(100))), [is_json] = ISJSON(CAST(SESSION_CONTEXT(CAST([value] AS NVARCHAR(100))) AS NVARCHAR(MAX))) INTO #TMP FROM OPENJSON(CAST(SESSION_CONTEXT(N'__index__') AS NVARCHAR(MAX)))
-- generate select list to account for json column data. -- we have to use JSON_QUERY on json data so that it is treated as a json fragment when the overall select is serialized DECLARE @SelectList NVARCHAR(MAX) = (SELECT STUFF(( -- stuff select begin SELECT ',' + IIF([is_json] = 1, QUOTENAME([key]) + ' = JSON_QUERY(CAST(' + QUOTENAME([key]) + ' AS NVARCHAR(MAX)))', QUOTENAME([key])) FROM #TMP FOR XML PATH ('') -- stuff select end ), 1, 1, '') ) --SELECT @SelectList
-- generate pivot list DECLARE @PvtList NVARCHAR(MAX) = (SELECT STUFF(( -- stuff select begin SELECT ',' + QUOTENAME([key]) FROM #TMP FOR XML PATH ('') -- stuff select end ), 1, 1, '') ) --SELECT @PvtList
-- generate message span DECLARE @Sql NVARCHAR(MAX) = '' SELECT @Sql = ' DECLARE @Json NVARCHAR(MAX) = ( SELECT {1} FROM ( SELECT [key], [value] FROM #TMP ) a PIVOT ( MAX([value]) FOR [key] IN ({0}) ) pvt FOR JSON AUTO, WITHOUT_ARRAY_WRAPPER/*, INCLUDE_NULL_VALUES*/ ) SELECT @Json' SELECT @Sql = REPLACE(@Sql, '{0}', @PvtList) SELECT @Sql = REPLACE(@Sql, '{1}', @SelectList)
DECLARE @Json TABLE ([Json] NVARCHAR(MAX)) INSERT INTO @Json EXECUTE(@Sql)
-- generate full loki request DECLARE @LokiRequest NVARCHAR(MAX) = ( SELECT [stream] = JSON_QUERY(@StreamLabels), [values] = JSON_QUERY('[[' + QUOTENAME(CAST(@Time AS VARCHAR(20)), '"') + ',"' + STRING_ESCAPE([Json], 'json') + '"]]') FROM @Json FOR JSON AUTO, ROOT('streams') ) --SELECT @LokiRequest --PRINT @LokiRequest
-- reset exception EXECUTE LogProperty_Add 'Exception', NULL
-- loose the request DECLARE @HeadersJson NVARCHAR(MAX) = ( SELECT * FROM ( SELECT [content-type] = 'application/json', [content-length] = CAST(LEN(@LokiRequest) AS VARCHAR(20)) ) a FOR JSON AUTO ) --SELECT @HeadersJson
EXECUTE HttpPost 'http://localhost:3100/loki/api/v1/push', @LokiRequest, NULL, NULL, @HeadersJson END TRY BEGIN CATCH -- swallow END CATCHENDFatal
CREATE PROCEDURE [dbo].[LogTo_Fatal]( @Message NVARCHAR(4000), @CallingProc SYSNAME = NULL)ASBEGIN IF @CallingProc IS NOT NULL BEGIN EXECUTE LogProperty_AddSourceContext @CallingProc END
-- extra error details IF ERROR_MESSAGE() IS NOT NULL BEGIN DECLARE @Exception NVARCHAR(4000) SELECT @Exception = ( SELECT * FROM ( SELECT [ErrorNumber] = ERROR_NUMBER(), [ErrorSeverity] = ERROR_SEVERITY(), [ErrorState] = ERROR_STATE(), [ErrorProcedure] = ERROR_PROCEDURE(), [ErrorLine] = ERROR_LINE(), [ErrorMessage] = ERROR_MESSAGE() ) exception FOR JSON AUTO, WITHOUT_ARRAY_WRAPPER )
EXECUTE LogProperty_Add 'Exception', @Exception END
EXECUTE LogTo_Impl @Message, 'critical'ENDError
CREATE PROCEDURE [dbo].[LogTo_Error]( @Message NVARCHAR(4000), @CallingProc SYSNAME = NULL)ASBEGIN IF @CallingProc IS NOT NULL BEGIN EXECUTE LogProperty_AddSourceContext @CallingProc END
-- extra error details IF ERROR_MESSAGE() IS NOT NULL BEGIN DECLARE @Exception NVARCHAR(4000) SELECT @Exception = ( SELECT * FROM ( SELECT [ErrorNumber] = ERROR_NUMBER(), [ErrorSeverity] = ERROR_SEVERITY(), [ErrorState] = ERROR_STATE(), [ErrorProcedure] = ERROR_PROCEDURE(), [ErrorLine] = ERROR_LINE(), [ErrorMessage] = ERROR_MESSAGE() ) exception FOR JSON AUTO, WITHOUT_ARRAY_WRAPPER )
EXECUTE LogProperty_Add 'Exception', @Exception END
EXECUTE LogTo_Impl @Message, 'error'ENDDebug
CREATE PROCEDURE [dbo].[LogTo_Debug]( @Message NVARCHAR(4000), @CallingProc SYSNAME = NULL)ASBEGIN IF @CallingProc IS NOT NULL BEGIN EXECUTE LogProperty_AddSourceContext @CallingProc END
EXECUTE LogTo_Impl @Message, 'debug'ENDWarning
CREATE PROCEDURE [dbo].[LogTo_Warning]( @Message NVARCHAR(4000), @CallingProc SYSNAME = NULL)ASBEGIN IF @CallingProc IS NOT NULL BEGIN EXECUTE LogProperty_AddSourceContext @CallingProc END
EXECUTE LogTo_Impl @Message, 'warn'ENDInformation
CREATE PROCEDURE [dbo].[LogTo_Information]( @Message NVARCHAR(4000), @CallingProc SYSNAME = NULL)ASBEGIN IF @CallingProc IS NOT NULL BEGIN EXECUTE LogProperty_AddSourceContext @CallingProc END
EXECUTE LogTo_Impl @Message, 'info'ENDVerbose
CREATE PROCEDURE [dbo].[LogTo_Verbose]( @Message NVARCHAR(4000), @CallingProc SYSNAME = NULL)ASBEGIN IF @CallingProc IS NOT NULL BEGIN EXECUTE LogProperty_AddSourceContext @CallingProc END
EXECUTE LogTo_Impl @Message, 'trace'ENDLog Property Stack
This is how you add additional context to your logs with some convenience functions below. They are just key value pairs that will get added to the log stream and can be parsed out using the Loki logQL within Grafana.
Implementation
ALTER PROCEDURE [dbo].[LogProperty_Add]( @Name NVARCHAR(100), @Value NVARCHAR(4000))ASBEGIN BEGIN TRY --EXECUTE sp_set_session_context '__index__', '["ApplicationInstance", "CorrelationId"]' --DECLARE @Name VARCHAR(100) = 'InstanceId'
-- you cannot query values from the SESSION_CONTEXT without explicitly knowing the key -- for this reason, we house a list of keys in a __index__ session value so we know what -- to pull when we generate the actual log -- add new key to property bag DECLARE @IndexKeys NVARCHAR(4000) SELECT @IndexKeys = '[' + ISNULL(STUFF(-- stuff select begin(SELECT ',"' + [Value] + '"'FROM ( SELECT [value] FROM OPENJSON(CAST(SESSION_CONTEXT(N'__index__') AS NVARCHAR(MAX))) UNION SELECT @Name) a([value])FOR XML PATH ('')),-- stuff select end 1, 1, ''), '') + ']'
-- persist new property bag EXECUTE sp_set_session_context '__index__', @IndexKeys --SELECT SESSION_CONTEXT(N'__index__')
-- persist log property EXECUTE sp_set_session_context @Name, @Value END TRY BEGIN CATCH -- swallow END CATCHENDCorrelationId
ALTER PROCEDURE [dbo].[LogProperty_CorrelationId]( @Value NVARCHAR(4000))ASBEGIN EXECUTE LogProperty_Add 'CorrelationId', @ValueENDSourceContext
ALTER PROCEDURE [dbo].[LogProperty_AddSourceContext]( @Value NVARCHAR(4000))ASBEGIN EXECUTE LogProperty_Add 'SourceContext', @ValueENDInstanceId
ALTER PROCEDURE [dbo].[LogProperty_InstanceId]( @Value NVARCHAR(4000))ASBEGIN EXECUTE LogProperty_Add 'InstanceId', @ValueEND~ SK