Author Topic: Moving large amounts of data between HDDs... CRC "fail"?  (Read 2856 times)

Offline kitamesume

  • Member
  • Posts: 7219
  • Death is pleasure, Living is torment.
Re: Moving large amounts of data between HDDs... CRC "fail"?
« Reply #20 on: July 02, 2012, 08:36:06 AM »
i don't know much about network transfers since i've given up on it, using USB externals as a medium of transfer at the moment, it never seems to fail even once.

so if a USB external HDD transfer isn't failing but a LAN network is failing, would that mean the fault is within the network? like the NIC or the router, or maybe LAN is heavily susceptible to interference?

Haruhi Dance | EMO | OLD SETs | ^ I know how u feel | Click sig to Enlarge

Offline rkruger

  • Member
  • Posts: 124
  • #include <bakabt.h>
Re: Moving large amounts of data between HDDs... CRC "fail"?
« Reply #21 on: July 03, 2012, 05:52:10 AM »
so if a USB external HDD transfer isn't failing but a LAN network is failing, would that mean the fault is within the network?
Yes, sounds like a logical conclusion to me.
It should be possible to view statistics for the NIC, like the amount of transmission errors. That will give you a clear indication that there is a problem with the network or not.

like the NIC or the router, or maybe LAN is heavily susceptible to interference?
The first thing to check is usually the cabling itself.
Make sure that the network cables are away from any power cables, as they may cause interference.
If are able, try to replace the network cables with some other ones.

Offline lapa321

  • Member
  • Posts: 567
Re: Moving large amounts of data between HDDs... CRC "fail"?
« Reply #22 on: July 03, 2012, 09:11:48 AM »
so if a USB external HDD transfer isn't failing but a LAN network is failing, would that mean the fault is within the network?
Yes, sounds like a logical conclusion to me.
It should be possible to view statistics for the NIC, like the amount of transmission errors. That will give you a clear indication that there is a problem with the network or not.

If the cable was causing data errors. Shouldn't the NIC and the network protocols be able to catch them?

Offline Freedom Kira

  • Member
  • Posts: 4324
  • Rawr™.
Re: Moving large amounts of data between HDDs... CRC "fail"?
« Reply #23 on: July 04, 2012, 02:29:17 AM »
Not all transmission errors are catchable. You can make a CRC that catches up to n bits of error at a time (requires a CRC checksum that is n bits long) but it may not catch n+1 or more bits of error at a time (i.e. if n+1 bits are flipped in transmission).

At least, this is how I remember it from my networks class...

Offline lapa321

  • Member
  • Posts: 567
Re: Moving large amounts of data between HDDs... CRC "fail"?
« Reply #24 on: July 04, 2012, 03:33:47 AM »
Not all transmission errors are catchable. You can make a CRC that catches up to n bits of error at a time (requires a CRC checksum that is n bits long) but it may not catch n+1 or more bits of error at a time (i.e. if n+1 bits are flipped in transmission).

At least, this is how I remember it from my networks class...

Same here. Error detection is not perfect, and certain errors will still register as correct if the checksum of the error is identical to the correct one (CRC is not a byte comparison). The odds of that happening is incrediby small, but we deal with terabytes of data so it's unavoidable that an error will occasionally slip through.

BTW, i've only found some NAS utilities, but i can't use them for burn tests since they only seem to do thoroughput performance, not data integrity since they don't seem to do byte comparisons. Still need to search some more.
« Last Edit: July 04, 2012, 03:39:16 AM by lapa321 »

Offline rkruger

  • Member
  • Posts: 124
  • #include <bakabt.h>
Re: Moving large amounts of data between HDDs... CRC "fail"?
« Reply #25 on: July 04, 2012, 06:21:05 AM »
If the cable was causing data errors. Shouldn't the NIC and the network protocols be able to catch them?
Maybe, maybe not. But the errors are logged regardless, and can be used to help isolate the problem.

Offline per

  • Member
  • Posts: 114
Re: Moving large amounts of data between HDDs... CRC "fail"?
« Reply #26 on: July 15, 2012, 05:27:02 PM »
Same here. Error detection is not perfect, and certain errors will still register as correct if the checksum of the error is identical to the correct one (CRC is not a byte comparison). The odds of that happening is incrediby small, but we deal with terabytes of data so it's unavoidable that an error will occasionally slip through.

Not really. I think you missrepresent the error rate by a few order of magnitudes. :)

Let's list the checksums used in order:

Ethernet: CRC32 - requires at least 3 biterrors in a single packet to give the same checksum
IP:         Header checksum. Not all that relevant
TCP/IP:   Simple 16-bit xor-sum. Protects against 1bit-errors, but in a different way
(the TCP checksum is not present for v6, since the link-layer checksums are sufficient for any reasonable data security requirements)

For ethernet/TCP checksums the average 1bit error rate according to google is about one per Tb when running 10Gbps over a maximum length copper cable (let's take that as the worst case, it sort of is).

For the error to not be detected you need at least three (very carefully chosen, but let's ignore that) errors in the same 1.4Kb packet (1bit errors are corrected, 2 bit errors are always detected, 3+ bit errors can be detected, but might not be).

The chance for that is then about (16e3/1e12)^3 (per packet) unless I misremember math.

This means you have once chance in 1e-24 for each Kb (aproximately), or one chance in 100000000 to get a single error when transfering your whole 10Tb collection.

Then again, the highest protection is given by the ethernet checksum. This is only used for traffic between nodes, not inside the nodes (at least not with checksum offloading, ie, data sent from ram -> network card via DMA might be corrupted, before it is sent).

The TCP checksum is generally speaking calculcated by the OS, though, before the data is sent to the network card.

In practice, the error is in either the motherboard or the RAM.

Get high quality ECC RAM
Get high quality motherboards
Get a filesystem that checksums the data

The last two times I updated my storage, and copied 8 and then 12Tb, I have not had a single error.

So saying that it is 'unavoidable' is rather misleading.

I have, however, had about 12 disks fail on me over the last five years. One of those failed disks produced 'soft' errors, not detected as unrecoverable read errors, instead it just returned the wrong data. So I guess that can be added to the list:

Avoid disks with read/write errors. Caught by the filesystem if it has checksums, though.


Using a different program to copy the data will not help all that much unless the program also checksums the destination (and source). And that will not help if it is the computer the program is running on that is shaky (bad RAM/MB/controller).

It will protect against bad network cards and switches, and the destination computer being bad.

Offline megido-rev.M

  • Member
  • Posts: 16121
Re: Moving large amounts of data between HDDs... CRC "fail"?
« Reply #27 on: July 17, 2012, 02:22:18 AM »
^ I've called that out as not-right already. ;D