What to look for, if you have CFast 2.0 issues.

Thu Oct 26, 2017 3:36 pm

If you work with CFast 2.0 cards, chances are, you already had a look at the so-called S.M.A.R.T. report. The SMART report can give you some valuable information on a card, starting from Model, Firmware version, and serial number, to health indicators.

There are a number of tools with GUI or without – like DriveDX and smartmontools, which I'll use as examples in this post – to read this SMART report from the card. Depending on the tool, the SMART report will have a slightly different structure.
Look for a section that says “DRIVE HEALTH INDICATORS” or “Vendor Specific SMART Attributes with Thresholds”. This section contains a few values that you can check regularly. If you keep track of the SMART report changes for every card, you should be able to see problems early on, but it’s additional work.

Life Time
MLC memory should be good for around 3000+ write cycles. That’s full capacity write (a.k.a. program) and erase. If a card is reported as EOL, we simply forward a flag that is issued by the controller on the card.

SanDisk uses Attribute ID#230 (shown as Percentage Total P/E Count in DriveDX) to show the computed “age” of a card. 0%=new 100%= EOL.
Lexar does not output a single value, so there is a formula, based on Total LBAs written and some other values I cannot present. I haven’t seen one CFast card that went EOL because the Total LBAs Written (attribute ID#241) got anywhere near the 3k write cycles. The issue is more often a number of uncorrectable errors, which the controller preemptively flags as EOL.

I/O Errors
If the card produces a write error, SanDisks will log it under Attribute ID# 171 (shown as “Program Fail Count” in Drive DX). Lexar uses ID# 195 “Hardware_ECC_Recovered”. If this error is shown, you should have a close look at the footage as there may be some corrupted frames (e.g. with streaks). If you shot ARRIRAW on a Mini, you can perform an automated check for corrupt frames with our free ARRI Meta Extract.
Lexar also offers attribute# 199 UDMA_CRC_Error_Count, to indicate that there was a communication problem between camera and card, which may show up with similar symptoms.

Run Time Bad Blocks
Both SanDisk and Lexar use ID#5 to show the Reallocated_Sector_Ct (shown as “Retired Block Count” in DriveDX).
This value will be 0 when the disk is new, but may increase over time and manifest as some kind of I/O error. Either recording stops unexpectedly or copy/playback of a certain clip may fail. You don’t have to ditch a card, just because it has 1 or 2 retired blocks. If you see that this number increase constantly, it’s time to get that card replaced.

RTBBs can be mapped out by the card’s controller to allow continued use of the medium. Our camera has an internal mechanism to detect and fix run time bad blocks when the card is sanitized. Read the instructions below to get rid of a grown bad block.

What to do, if you have a RTBB:
Obviously, you have to get all footage off the card. It’s important, however, that you do it now!
Bad block data corruption can spread like a Zombie apocalypse, as the card controller tries (and fails) to error-correct the data on the bad block and spreads data corruption over initially healthy blocks.

• If you get an I/O error as you try to record or play or copy a clip, try to copy the individual clips off the card instead of the entire folder. If the defective does not contain an important take, forget the take and give the card a recovery treatment (see below).

• If you need that defective clip, use a with a block copy tool (dd, ddrescue for command line or e.g. unstoppable copier on Windows) and create an image file of the card (you’ll need enough free disk space). If you pad the defective block(s) with zeros, you’ll be able to see and access the file, but it will either not open or have some kind of defect somewhere over its duration. You can use software like Video Repair Tool by grauonline.de to compare the defect clip to a good file (short clip with same camera settings). You may loose a few frames, metadata and audio, but if you are lucky, you will be able to salvage the important part of the take.
If the card contains ARRIRAW files, you can use our free ARRI Meta Extract tool to scan the takes for defects (image checksum check). This will show you if the image content in each clip is what the camera wrote onto that card.

• Once you have the file, you can give the card a recovery treatment.
1) Take note of the retired block count from the initial SMART report.
2) SANITIZE the card in the camera (SanDisk and Lexar also offer sanitizing-tools that you can run on your workstation).
3) Get the SMART status again and check if the bad block count has increased.

3.1) YES – The block is mapped out and the card can be used again. Probability of another bad block is not higher than on a card with 0 RTBBs.
3.2) NO – Stick the card into the camera and record a 200fps ProRes444 clip until it’s full and get the SMART status a third time and check if the bad block count has increased.

3.2.1) YES –The block is mapped out and the card can be used again. Probability of another bad block is not higher than on a card with 0 RTBBs.
3.2.2) NO – Retire the card.

Open for suggestions.
Oliver Temmler
Product Specialist Camera Systems/Product Manager Storage Media
ARRI Munich
Oliver Temmler
 
Posts: 38
Joined: Wed Aug 13, 2014 3:16 pm
Location: Munich, Germany

Re: What to look for, if you have CFast 2.0 issues.

Tue Nov 07, 2017 3:09 pm

Hi Oliver,

Thanks for this in depth run down. It was very timely as I had a multi camera job with lots of CFast cards just a few days after reading it.

Unfortunately there wasn't enough time scheduled in prep to fully run all the CFast cards through DriveDX before the shoot (It seems to struggle with detecting new cards as they are connected and ejected, often taking a long time to recognise them or requiring that it was closed and re-opened to do so. Sometimes it wouldn't detect them until after a full reboot. Formatting would also cause system hangs for some cards as they failed to write to the closing blocks and that would require a further reboot so it took a long time) I did manage to detect and keep 5 cards out of circulation that DriveDX flagged warnings for as possibly failing.

Unfortunately 1 of the cards we shot seems to have an I/O error. Whilst it formatted and recorded with no complaints in camera, it failed to copy and the clips could not be played back from the card - the first few frames would play but then it would freeze and cause some apps to hang.

Reading what you said about Bad block data corruption being able to spread like a Zombie apocalypse I immediately ceased all attempts to read and play it back (Though in the past I have had some success loading a faulty clip on a card in to Scratch and transcoding out the sections either side of a bad block I decided I did not want to risk it this time after reading that). Instead I went to try and follow your approach to recovery.

Initially I tried looking into how to use the dd command since that is native on OSX but couldn't find any info or options on padding bad blocks with zeros.

Next I decied to try Installing ddrescue for command line which was a massively lengthly process (following the guides I found online for installing ddrescue required I install Homebrew via command line, which in turn needed Xcode components, which meant I had to install Xcode which took hours over the wifi at the location). Then I found the instructions for using ddrescue are thin on the ground out there for how to make an image with bad blocks padded out with zeros and by the time it got to that stage it was 3 am and I was too tired to take in or even search effectively for the command line tutorials. In the end I had to leave production with the card to get someone specialist do it the next day whilst I caught up on sleep since they had a fast turn around. I later found out that ddrescue gui includes the ddrescue package in the installer so possibly I didn't need to go through all that, though ddrescue gui doesn't include the pad with zeros option in it's interface unfortunately. It seems also that there may also be old branches of ddrescue and a new project that took over with a slight variation on the name, so it is a bit of a minefield trying to find and install the correct one.

If you have the knowledge and experience to share on putting your suggested recovery steps into practice it would be great if you could add to this post some more detailed instructions on using the recovery tools you mentioned and links with where to find and install them. I would like to have them all installed and set up on my system ready for next time I come across a bad clip on a card that needs recovering..

Many thanks,

Quentin.
Quentin Brown
 
Posts: 1
Joined: Sat Nov 04, 2017 4:10 pm

Return to Workflow