"PHP: mb_detect_encoding - Manual". www.php.net. Retrieved 2024-11-12. https://www.php.net/manual/en/function.mb-detect-encoding.php#:~:text=Automatic%20detection%20of%20the%20intended,encrypted%20string%20without%20the%20key.
Kim, Seung-Ho; Park, Jongsoo (2007). "Automatic Detection of Character Encoding and Language". {{cite journal}}: Cite journal requires |journal= (help) https://www.semanticscholar.org/paper/Automatic-Detection-of-Character-Encoding-and-Kim-Park/ec4965fa465b92aed3e8843ab24ce809cb50a9e7
Kim, Seung-Ho; Park, Jongsoo (2007). "Automatic Detection of Character Encoding and Language". {{cite journal}}: Cite journal requires |journal= (help) https://www.semanticscholar.org/paper/Automatic-Detection-of-Character-Encoding-and-Kim-Park/ec4965fa465b92aed3e8843ab24ce809cb50a9e7
"PHP: mb_detect_encoding - Manual". www.php.net. Retrieved 2024-11-12. https://www.php.net/manual/en/function.mb-detect-encoding.php#:~:text=Automatic%20detection%20of%20the%20intended,encrypted%20string%20without%20the%20key.
"A composite approach to language/encoding detection". www-archive.mozilla.org. Retrieved 2024-11-12. https://www-archive.mozilla.org/projects/intl/universalcharsetdetection.html
In a random byte string, a byte with the high bit set has only a 1/15 chance of starting a valid UTF-8 code point. Odds are even lower in actual text, which is not random but tends to contain isolated bytes with the high bit set which are always invalid in UTF-8.
"A composite approach to language/encoding detection". www-archive.mozilla.org. Retrieved 2024-11-12. https://www-archive.mozilla.org/projects/intl/universalcharsetdetection.html