Given a UTF-8 Char Array, we can use the following Java Function to Convert to Raw Byte Array. Each UTF-8 Character has 3 types: 3 bytes, 2 bytes or 1 byte depending on the first byte range.
public static byte[] char2Byte(char[] a) {
int len = 0;
// obtain the length of the byte array
for (char c : a) {
if (c > 0x7FF) {
len += 3;
} else if (c > 0x7F) {
len += 2;
} else {
len++;
}
}
// fill the byte array with UTF-8 characters
var result = new byte[len];
int i = 0;
for (char c : a) {
if (c > 0x7FF) {
result[i++] = (byte) (((c >> 12) & 0x0F) | 0xE0);
result[i++] = (byte) (((c >> 6) & 0x3F) | 0x80);
result[i++] = (byte) ((c & 0x3F) | 0x80);
} else if (c > 127) {
result[i++] = (byte) (((c >> 6) & 0x1F) | 0xC0);
result[i++] = (byte) ((c & 0x3F) | 0x80);
} else {
result[i++] = (byte) (c & 0x7F);
}
}
return result;
}
First, we iterate the char array to compute the total length of the result byte array, and then second pass, we fill the byte array with corresponding UTF-8 value.
–EOF (The Ultimate Computing & Technology Blog) —
222 wordsLast Post: Teaching Kids Programming - Determine a Armstrong Number
Next Post: Teaching Kids Programming - Number of Quadruplets That Sum Target via Hash Table