Project

General

Profile

tigase.jaxmpp.j2se.connectors.socket.TextStreamReader breaks utf-8 streams

Igor Kuralenok
Added almost 2 years ago

In case of byte (not char) buffers used at the server side, current version of the reader will break UTF-8 characters (as a matter of fact pretty much any multibyte character) at the boundary due to last clear call. This version seems to be more optimal:

public class TextStreamReader implements Reader {
  private final ByteBuffer buf = ByteBuffer.allocate(DEFAULT_SOCKET_BUFFER_SIZE);
  private final CharsetDecoder decoder = Charset.forName("UTF-8").newDecoder();
  private final ReadableByteChannel inputStream;

  public TextStreamReader(InputStream inputStream) {
    this.inputStream = Channels.newChannel(inputStream);
  }

  @Override
  public int read(char[] cbuf) throws IOException {
    inputStream.read(buf);
    final CharBuffer cb = CharBuffer.wrap(cbuf);
    buf.flip();
    decoder.decode(buf, cb, false);
    buf.compact();
    cb.flip();
    return cb.remaining();
  }
}

Replies (1)

Added by Wojciech Kapcia almost 2 years ago

Igor Kuralenok wrote:

In case of byte (not char) buffers used at the server side, current version of the reader will break UTF-8 characters (as a matter of fact pretty much any multibyte character) at the boundary due to last clear call. This version seems to be more optimal:

[...]

Hello Igor, thank you for the information. Could you submit new Issue (https://projects.tigase.org/projects/jaxmpp2/issues/new) and select Source Code Disclaimer so we can include the patch in our sources?

    (1-1/1)