Description
Labeled Tab-separated Values (LTSV) format is a variant of Tab-separated Values (TSV). Each record in a LTSV file is represented as a single line. Each field is separated by TAB and has a label and a value. The label and the value have been separated by ':'. With the LTSV format, you can parse each line by spliting with TAB (like original TSV format) easily, and extend any fields with unique labels in no particular order.
FAQ
Follow the link.Example
The LTSV format originally focuses on access logs of web servers, so I'll show an access log of traditional Combined Log Format and the same log of LTSV format version as examples.
The configuration of traditional Combined Log Format on Apache is:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined
and access log will look like: (ref. http://httpd.apache.org/docs/2.2/logs.html)
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"
The configuration of LTSV format with the same infomation will be:
LogFormat "host:%h\tident:%l\tuser:%u\ttime:%t\treq:%r\tstatus:%>s\tsize:%b\treferer:\%{Referer}i\tua:%{User-Agent}i" combined_ltsv
then the access log will be like:
host:127.0.0.1<TAB>ident:-<TAB>user:frank<TAB>time:[10/Oct/2000:13:55:36 -0700]<TAB>req:GET /apache_pb.gif HTTP/1.0<TAB>status:200<TAB>size:2326<TAB>referer:http://www.example.com/start.html<TAB>ua:Mozilla/4.08 [en] (Win98; I ;Nav)
Here is a simple LTSV parser:
#!/usr/bin/env ruby
while gets
record = Hash[$_.split("\t").map{|f| f.split(":", 2)}]
p record
end
With this parser, you will get the hash like:
{"host"=>"127.0.0.1", "ident"=>"-", "user"=>"frank", "time"=>"[10/Oct/2000:13:55:36 -0700]", "req"=>"GET /apache_pb.gif HTTP/1.0", "status"=>"200", "size"=>"2326", "referer"=>"http://www.example.com/start.html", "ua"=>"Mozilla/4.08 [en] (Win98; I ;Nav)\n"}
Definition
A LTSV file must be a byte sequence which
matches the ltsv
production in the following ABNF:
ltsv = *(record NL) [record]
record = [field *(TAB field)]
field = label ":" field-value
label = 1*lbyte
field-value = *fbyte
TAB = %x09
NL = [%x0D] %x0A
lbyte = %x30-39 / %x41-5A / %x61-7A / "_" / "." / "-" ;; [0-9A-Za-z_.-]
fbyte = %x01-08 / %x0B / %x0C / %x0E-FF
Recommendations for labeling
The specification of LTSV is simple and primitive. Nevertheless label standardization may help to improve reusability of some implementations for processing or analysis.
Labels for Web server's Log
Here are labeling recommendations, their descriptions, format strings for apache and ones for nginx.
Recommended Label | Description | Format String of Apache mod_log_config | Format String of nginx log format |
---|---|---|---|
time | Time the request was received | %t | $time_local |
host | Remote host | %h | $remote_addr |
forwardedfor | X-Forwarded-For header | %{X-Forwarded-For}i | $http_x_forwarded_for |
ident | Remote logname | %l | |
user | Remote user | %u | $remote_user |
req | First line of request | %r | $request |
method | Request method | %m | $request_method |
uri | Request URI | %U%q | $request_uri |
protocol | Requested Protocol (usually "HTTP/1.0" or "HTTP/1.1") | %H | $server_protocol |
status | Status code | %>s | $status |
size | Size of response in bytes, excluding HTTP headers. | %B (or '%b' for compatibility with combined format) | $body_bytes_sent |
reqsize | Bytes received, including request and headers. | %I (mod_log_io required) | $request_length |
referer | Referer header | %{Referer}i | $http_referer |
ua | User-Agent header | %{User-agent}i | $http_user_agent |
vhost | Host header | %{Host}i | $host |
reqtime_microsec | The time taken to serve the request, in microseconds | %D | |
reqtime | The time taken to serve the request, in seconds | %T | $request_time |
cache | X-Cache header | %{X-Cache}o | $upstream_http_x_cache |
runtime | Execution time for processing some request, e.g. X-Runtime header for application server or processing time of SQL for DB server. | %{X-Runtime}o | $upstream_http_x_runtime |
apptime | Response time from the upstream server | - | $upstream_response_time |
A LogFormat example for Apache mod_log_config.
LogFormat "time:%t\tforwardedfor:%{X-Forwarded-For}i\thost:%h\treq:%r\tstatus:%>s\tsize:%B\treferer:%{Referer}i\tua:%{User-Agent}i\treqtime_microsec:%D\tcache:%{X-Cache}o\truntime:%{X-Runtime}o\tvhost:%{Host}i" ltsv
A log_format example for nginx.
log_format ltsv "time:$time_local"
"\thost:$remote_addr"
"\tforwardedfor:$http_x_forwarded_for"
"\treq:$request"
"\tstatus:$status"
"\tsize:$body_bytes_sent"
"\treferer:$http_referer"
"\tua:$http_user_agent"
"\treqtime:$request_time"
"\tcache:$upstream_http_x_cache"
"\truntime:$upstream_http_x_runtime"
"\tvhost:$host";
Tools supporting LTSV
fluentd
fluentd (http://fluentd.org/) supports to parse a LTSV file with in_tail plugin. The configuration is like this:
<source>
type tail
format ltsv
time_format %d/%b/%Y:%H:%M:%S %z
path /var/log/nginx/access_log
pos_file /var/log/nginx/access_log.pos
tag nginx.access
</source>
plugins for fluentd
- fluent-plugin-parser
- fluent-mixin-plaintextformatter
ltsview
- A viewer for LTSV. This tool includes the LTSV parser.
Plack::Middleware::AxsLog
- Plack::Middleware::AxsLog ( Fixed format but Fast AccessLog Middleware ) supports LTSV.
combined2ltsv.pl
- converts (common|combined) log to LTSV (ref. author's blog entry in Japanese.)
MCombined2LTSV.java
- converts (common|combined) log to LTSV. Implemented in Java (ref. author's blog entry in Japanese.)
Parser Implementations
Perl
- Text::LTSV (including ltsview, a viewer for LTSV)
Ruby
Python
PHP
- php-ltsv
- php-ltsv (yet another)
- Text-LTSV
- Text_LTSV
Java
D
Dart
- ltsv.dart
Emacs Lisp
- emacs-ltsv
Scheme
node.js
- ltsv
- ltsv-stream
Erlang
- erlang-ltsv
C#
- DynamicLTSV
- LTSV.NET
Go
- goltsv
- ltsv
- go-ltsv
Clojure
Scala
bash / ksh
Vim
- ltsv.vim
- Text.LTSV module (vital.vim)
C89
- ltsv4c
- c-ltsview
Apache Pig
- pig-ltsv-storage
Apache Hive
- KeyValuePairsDeserializer