Monday, February 23, 2015

Transcend MTS400 SSD power saving issues cause Chrome OS crashes and reboots on Acer C720

If you just want the technical details and workaround, skip down to The Problem.
My computer is an Acer C720-2800 Chromebook.



I've owned it since October 2013, and as of October 2014 it's been my primary and only computer. It's got 4GB of RAM, the Intel Haswell CPU is plenty fast. It ships with a 16GB SSD which has been more than plenty until recently, as I store all the things in the cloud (thanks Google!). I really can't say enough good things about it.
I decided to take a programming algorithms course through coursera and had need for a development environment. While I'm aware there are cloud based options available for coding, I prefer local and simple: python+vim+git+github. Chrome OS isn't really suited for that. Fortunately someone figured out how to install Debian/Ubuntu in a chroot co-existing along side Chrome OS: enter Crouton. This is where the 16GB SSD becomes a constraint.

Chrome OS comes in at around a lean 2GB. For the ability to safely rollback OS upgrades it maintains a second copy of itself on the disk for another 2GB. In a just world a 16GB SSD would provide 16GB of usable disk space, but out of the box only ~9.6GB free disk space is available after accounting for Chrome OS. My current chroot is taking up ~6GB. That doesn't leave much free disk space available for other things, like Hearthstone via PlayOnLinux.

Thankfully upgrading the SSD is simple. There are several articles on how to go about such a thing. All the articles use a brand of SSD I'm unfamiliar with: MyDigitalSSD. I was hesitant to purchase the MyDigitalSSD SSD because of said unfamiliarity, so I went shopping on Amazon looking for alternatives. The type of SSD used in the Acer C720 is a form factor that isn't widely available, 42mm M.2. I was only able to find one other drive that was compatible: a Transcend MTS400 256GB SSD. I briefly researched Transcend (wiki and website) and opted to buy their drive over the MyDigitalSSD drive since as it turns out Transcend is a huge Taiwanese memory product manufacturer I've never heard of. I figured if I needed product support, it'd be easier to get through Transcend. So I bought it and installed it when it arrived.
Installation was as trivial as the articles show. Post installation everything seemed fine, but it wasn't long after that I started experiencing full system crashes and reboots happening as frequently as ~20 minutes apart.

It's a Linux system that's crashing. Oh hey, I'm a Linux Systems Administrator. And with Crouton I have access to all the troubleshooting tools of a full blown Linux system. Several hours later I narrowed down the problem.

The Problem

The Transcend MTS400 SSD fails when the SATA link power management policy is set to min_power.

The workaround

Every time you switch from AC to battery power, open a shell (requires enabling developer mode) and run:
$ echo max_performance | sudo tee /sys/class/scsi_host/*/link_power_management_policy

Further details

Chrome OS uses laptop-mode-tools to keep track of whether you are running from AC or from battery, and runs a series of scripts to change many settings to reduce power consumption to prolong battery life. One such script is the Intel SATA power management script at /usr/share/laptop-mode-tools/modules/intel-sata-powermgmt:
/usr/share/laptop-mode-tools/modules/intel-sata-powermgmt@24:
        if [ "$SATA_POWER" -eq 1 ]; then 
                SATA_POWER="min_power"
        else    
                SATA_POWER="max_performance"
        fi      

        for POLICYFILE in /sys/class/scsi_host/*/link_power_management_policy ; do
                if [ -f $POLICYFILE ] ; then
                        log "VERBOSE" "Intel SATA link power saving set to $SATA_POWER for $POLICYFILE."
                        echo $SATA_POWER > $POLICYFILE
When on AC (wall) power SATA link power management policy is set to "max_performance", and when on battery to "min_power". It's min_power where the problem lies. Once set to min_power, the issues start. If you're lucky you only see performance hits while the following error messages begin appearing in dmesg:
[ 4115.488346] ata1.00: exception Emask 0x50 SAct 0x3 SErr 0x40c0800 action 0xe frozen
[ 4115.488362] ata1.00: irq_stat 0x00000040, connection status changed
[ 4115.488374] ata1: SError: { HostInt CommWake 10B8B DevExch }
[ 4115.488385] ata1.00: failed command: WRITE FPDMA QUEUED
[ 4115.488399] ata1.00: cmd 61/48:00:00:44:55/02:00:0a:00:00/40 tag 0 ncq 299008 out
         res 40/00:04:38:5f:28/00:00:01:00:00/40 Emask 0x50 (ATA bus error)
[ 4115.488419] ata1.00: status: { DRDY }
[ 4115.488428] ata1.00: failed command: WRITE FPDMA QUEUED
[ 4115.488442] ata1.00: cmd 61/00:08:00:48:55/03:00:0a:00:00/40 tag 1 ncq 393216 out
         res 40/00:04:38:5f:28/00:00:01:00:00/40 Emask 0x50 (ATA bus error)
[ 4115.488462] ata1.00: status: { DRDY }
[ 4115.488475] ata1: hard resetting link
[ 4116.191241] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 4116.191614] ata1.00: configured for UDMA/133
[ 4116.191652] ata1: EH complete
[ 4405.275634] ata1.00: exception Emask 0x50 SAct 0x1 SErr 0x40c0800 action 0xe frozen
[ 4405.275650] ata1.00: irq_stat 0x00000040, connection status changed
[ 4405.275662] ata1: SError: { HostInt CommWake 10B8B DevExch }
[ 4405.275674] ata1.00: failed command: READ FPDMA QUEUED
[ 4405.275688] ata1.00: cmd 60/20:00:90:b7:48/00:00:0a:00:00/40 tag 0 ncq 16384 in
         res 40/00:04:80:81:2b/00:00:01:00:00/40 Emask 0x50 (ATA bus error)
[ 4405.275716] ata1.00: status: { DRDY }
[ 4405.275731] ata1: hard resetting link
[ 4405.978040] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 4405.978737] ata1.00: configured for UDMA/133
[ 4405.978776] ata1: EH complete
If you're unlucky your system just hangs shortly before rebooting.

Contacting Transcend Support

So after having narrowed down the problem and having a workaround, I wanted to reach out to Transcend support to hopefully get this issue addressed in a firmware update. The interaction so far has been very disappointing with the exception of responding within the expectation they set of one business day. My initial form submission via their support contact form:
Description of Problems /
When on battery power, SATA link power management /sys/class/scsi_host/*/link_power_management_policy is set to 'min_power' on Acer C720 chromebook. This causes drive instability and will cause the OS to crash. Firmware revision is N0815B. Conveyance, short, and long smartctl tests complete without error.
Your Request /
I need a solution that allows the power management features of Chrome OS to work correctly with this drive.
The initial response:
Preston,
There appear to be numerous reports of the Acer C720 experiencing issues with crashing and system stability. Please refer to this article for an official response from Acer with regard to the C720-https://productforums.google.com/forum/#!topic/chromebook-central/gjSnZJeMEls%5B1-25%5D
Regards,
Reply:
Hey, thanks for the reply.
I have tested this issue extensively and I'm pretty sure this issue is with the drive and not with the Acer C720.
The OEM 16GB Kingston SSD has no issues either on AC or battery power.
While on AC power the Transcend drive has no issues.
While on battery power, the Transcend drive has issues.
I'm pretty confident that if you escalate this issue to the product team for this drive they can address the problem in a firmware update. Is escalation of this issue something you can do? If not, do you know what the best way to get in contact with someone that is capable of producing an updated firmware for this drive is? Appreciate the help.
Response:
Preston, 
I will certainly document your experience with the device with the appropriate parties. I will update you with information as it is made available.
Regards,
Reply:
Thanks. What kind of expectation should I have in regards to when I can reasonably expect an update: a day, a week?
Response:
Preston,
I am unable to provide you with a definitive answer. I anticipate dialogue to take place within a day or so, however I do not expect a resolution in as much time. Should you feel the need to inquire, feel free to do so at your convenience.
Regards,
Reply:
Appreciate the honest answer. I'll keep an eye out for an update from you on this. Let me know if you need any further info from me on this and otherwise I will wait to hear from you. Cheers.
I remain cautiously optimistic about the outcome from this support inquiry. I can't say I've had great experiences with other vendors in a similar regard. As a for instance, to my knowledge Microsoft still hasn't fixed the bug where Lync for Mac doesn't support the case-sensitive HFS file system. At the very least I have a workaround.

23 comments:

  1. Glad that you analysed this so well. I have the same problem. You think we can just edit the script you refer to? Something similar is suggested here (posted wrong link before):
    https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Power_Management_Guide/ALPM.html

    ReplyDelete
    Replies
    1. Hey Leo,

      The root file system is read only and unable to be modified as a security precaution of Chrome OS. There is a command that allows the file system to be modified but my understanding is that then breaks Chrome OS auto updates: http://www.chromium.org/chromium-os/poking-around-your-chrome-os-device#TOC-Making-changes-to-the-filesystem

      If you're ok with that, then after you've made the file system writable you can modify /etc/laptop-mode/conf.d/intel-sata-powermgmt.conf and change
      CONTROL_INTEL_SATA_POWER=1
      to
      CONTROL_INTEL_SATA_POWER=0

      and that should stop laptop-mode from changing the setting when moving from AC to battery.

      Delete
    2. It works! Was harder than I thought (vi is the only text editor in the chrome os shell and I wasnt familiar with linux), but without your post I would have never been able to fix it. Thanks!

      I think many people run into this problem and finding this post will help them a lot.

      Made a step by step guide at stack exchange here:
      http://superuser.com/questions/887916/transcend-mts400-ssd-crashes-my-acer-c720-chromebook-how-to-disable-sata-power

      Delete
    3. I'm really happy this post was able to help you Leo! The step by step guide you posted on stack exchange is excellent as well. I too often assume other people are very familiar with Linux and sometimes perhaps should go into more detail!

      Delete
  2. Have a C720 and searching for a SSD. Was looking on MydigitalSSD even though lots of failure report because the latest firmware is claimed fixing this. However the comments in amazon.com looks miserable even for recent purchases.

    I turned to this transcent SSD but crash problem is widely reported. Thank you for your solution but can you tell how the battery life is affected with max_performance profile selected? And the 'medium_poiwer' profile mentioned by Leo, did you try that and how does it work?

    Thank you !

    ReplyDelete
    Replies
    1. Glad to have helped! I have not tested the battery performance differences between max_performance, medium_power, and min_power. Subjectively it seems like battery drain is slightly increased on max_performance.

      I did briefly test the medium_power setting and it seemed to also provide system stability. You may find you have better battery life as well as system stability on the medium_power setting.

      I am usually on wall power so I always set max_performance. If you find you have issues with medium_power or do any battery life testing please report back as it may be helpful to others that find this post.

      Cheers.

      Delete
  3. Well it seems that not only Transcend have this issue, I can report that I have the same issue on an ADATA Premier SP600NS34 256GB on a ACER C720P. I searched for "c720 high capacity ssd crash" and found this site. To be abnle to catch the issue since the computer crash rather fast I ran this command to send the output of dmesg to another computer. sudo cat /proc/kmsg | ssh username@remotelinuxserver "cat chromebookoutput.txt"

    Thanks for the workaround.

    ReplyDelete
    Replies
    1. Hey Brian. I'm glad this helped you! I also really appreciate that you took the time to comment and share your experience here. I think that sharing the experience you had may help others like you.

      I really like the ssh pipe command! I wouldn't have thought of that.

      Delete
    2. I've got the same ADATA drive (128GB variant) and same problem on my Acer Chromebook 15. Since I use tlp, I edited the min_power setting there to medium_power and everything works now, except I do notice the fan goes on a lot more. Too bad ADATA doesn't offer a linux tool to update the firmware.

      Delete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Hello,
    just to let you know I updated the SSD on my Toshiba Chromebook 2 (2015) CB35-C3300 with the Transcend TS128GMTS400.
    It is working perfectly so far and I am not experiencing the "min_power" problem.
    I bought the SSD from Amazon few days ago, so probably Transcend made some update to solve the issue (or the Toshiba Chromebook is behaving differently with respect to the Acer C720, but I doubt about it)

    ReplyDelete
    Replies
    1. Thanks for the comment. It's useful to know what combinations of Chromebooks work with what after market drives. Thanks!

      Delete
  6. Thank you SO MUCH for tracking this issue. It's been a year since I look for the reason why my Transcend MTS400 sometimes freeze on every linux distro even chromium, the last I tried. I was so desesperate I had finally ended up with the distro back on the HDD. I thirst thought it was NCQ related although it was announced to support this function, then I thought it was my laptop (lenovo t440S) and then I found your blog. Since I use TLP I just had to set a new value for SATA_LINKPWR_ON_BAT in the conf file, medium_power seems to work so far, no more freeze. Thank you !

    ReplyDelete
  7. So I have been running Linux for a while on the my Acer, however due to various issues with the touchpad and microphone I thought I would give Chromeos another shot. Have you heard anything back from Transcend or know if the Chrome team has patched this? I can't figure out if ADATA has any way to download updated firmware though.

    ReplyDelete
    Replies
    1. Ok that did not work, even without disconnecting it from power, it only took a short while before it crashed and corrupted the content of the drive.

      Delete
  8. Hi, i changed the SSD on my Toshiba Chromebook 2 (2015) CB35-C3350 with the Transcend TS128GMTS400 and have the same problem, it work for 15/20 min and then crash. Sad!

    ReplyDelete
    Replies
    1. same as me, im changing back to 16gb work prefecrly without crash, and use external sd card...

      Delete
    2. i used the solution pesented here http://superuser.com/questions/887916/transcend-mts400-ssd-crashes-my-acer-c720-chromebook-how-to-disable-sata-power and it does work.

      Delete
  9. Are you on flattr ?

    Found this from here: http://ow.ly/kVwk302tQJM

    I want to flattr you :-)

    ReplyDelete
    Replies
    1. Hey Kiera thanks for commenting. I wasn't even aware flattr was a thing! While I appreciate the sentiment I'm not interested in being flattr'd. Feel free to pay it forward on my behalf if you'd like!

      It's super cool to see all the different places this info is ending up in. While it's unfortunate Transcend hasn't fixed the issue I'm glad to see the information disseminating.

      Delete
  10. Regarding the discussion with the Transcend rep, being on AC or battery is beside the point. The crashes depend on whether ALPM is enabled or not. The default config of e.g. tlp is to have max_performance on AC and min_power on battery, but one could run min_power on AC as well, which I used to do to save power on AC since I don't see any performance hit having ALPM enabled.

    My ADATA drive gives me the same error in linux, and since the error does not occur in Windows with ALPM enabled, it's not necessarily the fault of the drive's firmware, but rather the fault of Intel and their southbridge driver on linux. Someone over at Intel needs to sort this out, especially because on chipsets from Broadwell on, the southbridge is located on the CPU die, and having the SATA in a higher power state means the whole package can't go into a lower power state. Powertop shows the lowest I can get in on my Broadwell mobile chip is C2 with medium_power, but with min_power it can go into C6---and then crash!

    ReplyDelete
  11. It looks like something changed in the latest(9901.54.0 (Official Build) stable-channel gandof) build as i found that "medium_power" string that i usually put in the laptop-mode-tools script is not used anymore. The reason is that script is disabled by default in the /etc/laptop-mode/conf.d/intel-sata-powermgmt.conf. It looks like something were changed in the kernel as now I have min_power in /sys/class/scsi_host/host0/link_power_management_policy set by the kernel i guess and don't have any problems with my Toshiba chromebook upgraded with ADATA 256GB ssd. Can anyone trace this change in the ChromeOS kernel, board config?

    ReplyDelete
    Replies
    1. Nope, just got a hang on second coldboot. So with this ChromeOS version, i have to enable laptop-mode-tools script in addition to modifing this script itself to put safe mode to the sysfs variable.

      Delete