#4.log_cutting: 日志切割,有这样一个access.log每天会打出大量的日志。实现一个日志切割的功能,并说明该实现方式会有什么缺陷。
##Answer
###shell 代码:
#!/bin/bash
Usage(){
echo "Usage: $0 Logfile"
}
if [ $# -eq 0 ] ;then
Usage
exit 0
else
Log=$1
fi
date_log=$(mktemp)
cat $Log |awk -F'[ :]' '{print $4}'|awk -F'[' '{print $2}'|uniq > date_log
for i in `cat date_log`
do
grep $i $Log > /tmp/log/${i:7:10}-${i:3:3}-${i:0:2}.access
done
rm -f date_log
运行结果:
###Python2.7 代码:
#!/usr/bin/env python
# coding=utf-8
__author__ = 'tianfeiyu'
import threading
import re
pattern = re.compile(r'.*\[(\d+)\/(\w+)\/(\d+)\:.*')
def parse_log(file_line):
resu = pattern.match(file_line)
day,month,year = resu.groups()
with open('/tmp/log/%s-%s-%s' %(year,month,day),'a') as f:
f.write(file_line)
if __name__ == '__main__':
with open('/var/log/nginx/access.log') as f:
for line in f:
log_cut_thread = threading.Thread(target=parse_log,args=(line,))
log_cut_thread.setDaemon(1)
log_cut_thread.start()
运行结果:
缺陷:
程序运行时,需要将文件读入到内存中,当日志文件非常大时,对内存不友好,或者直接撑爆内存。程序执行速度比较慢。