effective logging management system & policy ? - V2EX
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
Distributions
Ubuntu
Fedora
CentOS
中文资源站
网易开源镜像站
zhuang
V2EX    Linux

effective logging management system & policy ?

  •  
  •   zhuang
    azhuang 2011-11-10 12:16:43 +08:00 3805 次点击
    这是一个创建于 5150 天前的主题,其中的信息可能已经有所发展或是发生改变。
    I've been long time dreaming about an effective system & policy for logging management. Ideally, it should be *usable at extreme condition.

    Yes, I'm talking about USABILITY. I guess most system administrators, experienced or newbie, may have failed at disaster recovery at least once, even you were fulfilled with thousands of backups. Backups are important, while what matters here is that the approach to rehabilitate is missing.

    When it comes to logging systems or policies, the question becomes, are you ready for the crime scene investigation? Unfortunately this is not a joke to me. I always define myself a detective, or sometimes a firefighter. Imagine such a scenario, a server is down and it's not maintained by you. Now it's your time to find what happened and to make it right by any means. You'd better be fast.

    So you grab a copy of log files, expecting some obvious clues to be found. I have to admit that I'd take a deep breath before diving into a deep sea of information. Wait a moment, you think you've got all available log files? You are too naive. Unix and Linux, despite commercial or free distos, they vary system-widely.

    Take a typical linux-based web server as example, you may first check rotating configs and estimate the time intervals that exceptions occurred. The syslog is general one but is far from enough. Web server and database daemons have its own ones. Since you start digging into the problem, you may need network-related logs as well, say iptables etc. If nothing seems weird, you may take package management systems into consideration. Sometimes account auditing will force you checking su/secure/auth logs. In a fatal condition like hacker invasion, these logs are probably no longer reliable and you have to ensure no rootkit exists at first. By the way, if the machine unluckily is kernel-hardened, all your work would time 3 or even more before you can get close to your target.

    Remember I've said *alienation? Some developers tried so hard to keep the management work easy and clear, so applauses to Gentoo communities. Commercial powers could do better, Mac OS X seems to reflow log information system-widely. But I still have complaints. To Solaris, what the hell are there 30+ directories under /var/log/ ? To HP, can you explain your philosophy, if your logging system is define by roles like admins/users, who the hell is network named nettl? Could log filename be more ugly than nettl.LOG000? And to AIX, does your proprietary implementation give you business success?

    Do blame me on my dirty words. Actually I tried so hard to be calm. This kind of additional work f*cks me so often, and no pleasure at all.

    Now we are just about to read logs, but usually several hours have passed. As I can say, cat/grep/tail are among the most powerful tools for log analyzing, especially you are familiar to regular expressions. When trouble-shooting, any visual solution like a web search engine which connects to log database can't provide more details.

    If you happen to have some knowledges about software development, you must know that end users rarely understand what the errors mean. Nor do system administrators. A more common case is like this, you sorted logs by levels, some FATAL ones appeared to be interesting. But a 30-minutes research proved out to be a waste of time, because either it was a segmentation fault, or an out-of-memory failure. Absolutely profiling a web application is other topic.

    Believe me this is not the worst case. Some of logs are naturally unreadable since it was not written for system administrators. Among the readable lines, find what really useful is somehow a word guessing game. A log file is generally rotated at 500KB or per week, so read it through is mission impossible. What I can do is to try different keyword combinations, if I'm lucky enough there will be some hints. (Web application coders may understand this well, if someone used automated sql-injection scripts and broke into the system, you probably had to read every http request to locate your bug.)

    Here is my story about why logging systems may fail. It does log well, but it is not handy enough to reproduce the crime scene. I wonder if you have any advices or solutions. Thank you.


    P.S. I originally post this article in my mailing list. I will post a Chinese summary lately when I get my pc.
    目前尚无回复
    关于     帮助文档     自助推广系统     博客     API     FAQ     Solana     1044 人在线   最高记录 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 31ms UTC 23:07 PVG07:07 LAX 15:07 JFK 18:07
    Do have faith in what you're doing.
    ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86