DevOpsWeekly#13 Instapaper 事故分析, 码农的义务, 在 stack overflow 远程工作,Google Testing Blog:Discomfort as a Tool for Change

This topic created in 3369 days ago, the information mentioned may be changed or developed.

Instapaper 事故分析: Instapaper Outage Cause & Recovery

Instapaper 使用基于 Amazon 的 Relational Datbase Service(RDS)的 MySQL, 事故原因是由于 2TB 文件大小限制: "The table 'bookmarks' is full", RDS console 并没警告信息. 宕机之后试着备份并转移数据,但耗时太久. 于是花 6 小时新建一个"limited access"的 instance, 此时已宕机 31 小时. 由于缺少应对此类紧急情况的计划, 他们并不清楚备份并恢复数据库需要多久. 一开始的估计是 6~8 小时, 但后来发现需要好几天. 最后一个 Aamazon 的工程师 mount 了一个 ext4 的文件系统到处问题的数据库服务器上,resync 了之前的 ext3 文件, 8 小时候一个基于 ext4 的数据库恢复成功. Action items:
- 设计一个更好的流程来应对此类事故,并上报问题到 Pinterest 的 Reliability Engineering team
- 测试数据库备份!

码农的义务

16 年的时候我们讨论过什么样的人才算高级程序员, 最近刚好又看到这篇, 不做翻译, 直接摘抄:
- Find the best solution and not a solution
- Do not write code you do not trust
- Learn how to say No. But also learn to negotiate.
- Respect your team members
- Enhance an implementation. Do not criticise it.

在 stack overflow 远程工作 - What it Means to be a Remote-First Company

stack overflow 大概有 300 多名员工, 其中有 85 名是 remote woker. 开会都是戴耳机 google hangouts face-to-face, 大量使用 Slack, Hangouts, 以及自内部的 chat 工具. chat 也被认为是异步 /asynchronous 的. 公司会给 remote woker 提供设备, 譬如 Herman Miller 的椅子, Steelcase 的可调整高度的桌子, Macbook, 外接显示器等等. 公司也会给一些钱用来支付"home office"花销, 譬如网络. 公司有活动时, 也会受到相应的礼物.

Google Testing Blog:Discomfort as a Tool for Change

Google Drive 的一个 SETI(Software Engineer, Tools and Infrastructure)写的关于解决 Product-Wide 问题的文章:
- Hard-to-use APIs
- Big, slow releases 解决方案:
- Incentivizing easy-to-use APIs 通过创建测试环境, 并与 engineering team leads 一起创建Fakes测试, 让 engineering 尽快获得反馈, 并帮助 Management 设计 engineer team 的绩效考核目标.
- Fast, small releases: less coupling between systems, bettwe/more automated testing, fater feedback.

博客: 采访 Agile Manifesto co-author Martin Fowler

iTunes Podcasts 可以搜索"The Agile Uprising Podcast"

1 replies 2017-02-20 12:57:42 +08:00

menc

Feb 20, 2017

基于 Amazon 的 Relational Datbase Service(RDS)的 MySQL, 事故原因是由于 2TB 文件大小限制: "The table 'bookmarks' is full", RDS console 并没警告信息

这个太坑了