Saville: July 2014

Saturday, 26 July 2014

Unicode and Encodings

Here is a summary of all things Unicode:

Unicode

Unicode maps 32-bit (4 byte) integers (code points) to characters
The first 127 code points (hex values 00 to 7f) are the same as ASCII
The next 128 code points (0×80-0xff) are the same as ISO-8859-1
An encoding is a mapping from bytes to Unicode code points

Character Reference and Code Tables

https://en.wikibooks.org/wiki/Unicode/Character_reference/0000-0FFF
http://www.unicode.org/charts/
ASCII: http://www.unicode.org/charts/PDF/U0000.pdf
Latin-1: http://www.unicode.org/charts/PDF/U0080.pdf
Combining Diacritical Marks: http://www.unicode.org/charts/PDF/U0300.pdf

Planes

A plane is a continuous group of 65,536 (= 2^16) code points
There are 17 planes, identified by the numbers 0 to 16
The Basic Multilingual Plane (BMP) is plane 0 (0000–FFFF)
Planes 1–16, are called “supplementary planes”
The code points in each plane have the hexadecimal values xx0000 to xxFFFF, where xx is a hex value from 00 to 10, signifying the plane to which the values belong

UTF-8 Encoding

UTF-8 is a way of storing those code points using less than 4 bytes per character
The first 127 values of UTF-8 map directly to Unicode code points, and hence to ASCII codes
Above 127, UTF-8 uses between two and four bytes for each code point
UTF-8 is not compatible with ISO-8859-1
Encoding design: https://en.wikipedia.org/wiki/UTF-8#Description
Example: https://en.wikipedia.org/wiki/UTF-8#Examples

UTF-16

Encodes code-points as one or two 16-bit code units
The code-points defined by the BMP are encoded as single 16-bit code units that are numerically equal to the corresponding code points
Code points from the Supplementary Planes are encoded by pairs of 16-bit code units called surrogate pairs: https://en.wikipedia.org/wiki/UTF-16#Code_points_U.2B010000_to_U.2B10FFFF

UTF-32

Uses exactly 32 bits per Unicode code point.
The UTF-32 form of a character is a direct representation of its codepoint
Example: 00 00 00 61 is UTF-32 for Unicode code point 61, which is 'a'

Byte Order Mark (BOM)

U+FEFF
If the endian architecture of the decoder matches that of the encoder, the decoder detects the 0xFEFF value, but an opposite-endian decoder interprets the BOM as the non-character value U+FFFE reserved for this purpose. This incorrect result provides a hint to perform byte-swapping for the remaining values
In UTF-16, a BOM (U+FEFF) may be placed as the first character of a file or character stream
The UTF-8 representation of the BOM is the byte sequence 0xEF,0xBB,0xBF
The Unicode Standard neither requires nor recommends the use of the BOM for UTF-8

HTML

HTML Entity: å (decimal) or å (hex) (= å)

URL Unicode Encoding

UTF-16: %uXXXX, e.g. %u00e9 -> é
UTF-8: %XX[%XX][%XX][%XX], e.g. %c2%a9 -> © %e2%89%a0 -> ≠

Compiled from:

http://www.darkcoding.net/software/finally-understanding-unicode-and-utf-8/
http://de.selfhtml.org/inter/unicode.htm
https://en.wikipedia.org/wiki/Plane_%28Unicode%29
https://en.wikipedia.org/wiki/UTF-8
https://en.wikipedia.org/wiki/UTF-16
https://en.wikipedia.org/wiki/UTF-32
https://en.wikipedia.org/wiki/Byte_order_mark

Friday, 25 July 2014

Video & Audio Containers & Codecs

"You may think of video files as “AVI files” or “MP4 files.” In reality, “AVI” and “MP4? are just container formats. Just like a ZIP file can contain any sort of file within it, video container formats only define how to store things within them, not what kinds of data are stored. (It’s a little more complicated than that, because not all video streams are compatible with all container formats, but never mind that for now.)

A video file usually contains multiple tracks — a video track (without audio), plus one or more audio tracks (without video). Tracks are usually interrelated. An audio track contains markers within it to help synchronize the audio with the video. Individual tracks can have metadata, such as the aspect ratio of a video track, or the language of an audio track. Containers can also have metadata, such as the title of the video itself, cover art for the video, episode numbers (for television shows), and so on." from http://diveintohtml5.info/video.html

Container	Extension	Common Video Codec	Common Audio Codec	Alfresco registered MimeType	Comment
MPEG4	.mp4 .m4v	H.264	AAC	.mp4: video/mp4 .m4v: video/x-m4v	Developed by ISO. The MPEG 4 container is based on Apple’s older QuickTime container (.mov). Can also be used to store other data such as subtitles and still images. MP4 files can contain metadata as defined by the format standard, and in addition, can contain Extensible Metadata Platform (XMP) metadata. More recent versions of Flash also support the MPEG 4 container.
WEBM	.webm	VP8	Vorbis	video/webm	Audio-video format designed to provide a royalty-free, open video compression format for use with HTML5 video. Development is sponsored by Google. Based on Matroska Media Container Adobe has also announced that a future version of Flash will support WebM video.
OGG	.ogv	Theora (=Ogg Video)	Vorbis (=Ogg Audio)	video/ogg	Ogg is an open standard, open source–friendly, and unencumbered by any known patents Ogg is a free, open container format maintained by the Xiph.Org Foundation. The Ogg container format can multiplex a number of independent streams for audio, video, text (such as subtitles), and metadata.
Flash Video	.flv	H.264 VP6Sorenson Spark	AAC MP3	video/x-flv	Developed by Adobe Systems Prior to Flash 9.0.60.184 (a.k.a. Flash Player 9 Update 3), this was the only container format that Flash supported
Audio Video Interleave	.avi	MPEG-4 part 2	MP3	video/x-msvideo	The AVI container format was invented by Microsoft in a simpler time. It does not even officially support most of the modern video and audio codecs in use today.
Matroska	`.mkv`	H.264	Vorbis		The Matroska Multimedia Container is an open standard free container format, a file format that can hold an unlimited number of video, audio, picture or subtitle tracks in one file
RealMedia	`.rm`	RealVideo	RealAudio		RealMedia is a proprietary multimedia container format created by RealNetworks. It is used for streaming content over the Internet.
3GP	`.3gp`	H.264 ...	AAC ...		It is used on 3G mobile phones but can also be played on some 2G and 4G phones.
3G2		H.264 ...	AAC ...	video/x-3gpp2	It is very similar to the 3GP file format, but has some extensions and limitations in comparison to 3GP.
QuickTime	.mov .qt	H.264	AAC	video/quicktime	Apple Inc. Multimedia container file that contains one or more tracks, each of which stores a particular type of data: audio, video, effects, or text (e.g. for subtitles)
Advanced Systems Format	.asf .wmv	Windows Media Video	Windows Media Audio	video/x-ms-asf video/x-ms-wmv	Microsoft's proprietary digital audio/digital video container format, especially meant for streaming media. Files containing only WMA audio can be named using a .WMA extension, and files of audio and video content may have the extension .WMV. Both may use the .ASF extension if desired.

Container	Extension	Common Audio Codec	Alfresco registered Mime Type	Comment
OGG	.oga .ogg	Vorbis (=Ogg Audio)	audio/ogg	Lossy audio compression Xiph.Org Foundation recommends that .ogg only be used for Ogg Vorbis audio files.
MP3	.mp3	MP3	audio/x-mpeg	Lossy audio compression An MP3 file that is created using the setting of 128 kbit/s will result in a file that is about 1/11 the size than the CD file created from the original audio source. Several bit rates are specified in the MPEG-1 Audio Layer III standard: 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256 and 320 kbit/s, and the available sampling frequencies are 32, 44.1 and 48 kHz. Additional extensions were defined in MPEG-2 Audio Layer III: bit rates 8, 16, 24, 32, 40, 48, 56, 64, 80, 96, 112, 128, 144, 160 kbit/s and sampling frequencies 16, 22.05 and 24 kHz A sample rate of 44.1 kHz is almost always used, because this is also used for CD audio, the main source used for creating MP3 files. Most MP3 files today contain ID3 metadata MP3 format allows for variable bitrate encoding, which means that some parts of the encoded stream are compressed more than others
WAV	.wav	PCM	audio/x-wav	Microsoft & IBM
Advanced Audio Coding	.m4a .aac	AAC	audio/aac	Lossy audio compression. AAC generally achieves better sound quality than MP3 at similar bit rates. AAC is also the default or standard audio format for iPhone, iPod, iPad, Nintendo DSi, iTunes and PlayStation 3.
Matroska	.`mka`	Vorbis
Advanced Systems Format	.wma	MP3	audio/x-ms-wma	An audio data compression technology developed by Microsoft. The name can be used to refer to its audio file format or its audio codecs.

Comparison of container formats: http://en.wikipedia.org/wiki/Comparison_of_container_formats

Audio file format: http://en.wikipedia.org/wiki/Audio_file_format

List of codecs: http://en.wikipedia.org/wiki/List_of_codecs

Comparison of video codecs: http://en.wikipedia.org/wiki/Comparison_of_video_codecs

Comparison of audio formats: http://en.wikipedia.org/wiki/Comparison_of_audio_codecs

Digital Audio: http://en.wikipedia.org/wiki/Digital_audio

Sample Rate: http://en.wikipedia.org/wiki/Sample_rate

Digital Video: http://en.wikipedia.org/wiki/Digital_video

FFmpeg

Dokumentation: http://ffmpeg.org/ffmpeg.html

Generelle Syntax: ffmpeg [global options] [[infile options][‘-i’ infile]]... {[outfile options] outfile}...

Allgemeine Optionen

Option	Beschreibung	Beispiel
-i	Bestimmt die Quelldatei (Input-File) und listet Informationen (Metadaten, Bitrate, Codierung, etc. ) über die Datei auf	ffmpeg -i lala.mp3
-codecs	Listet alle verfügbaren Codecs auf	ffmpeg -codecs
-formats	Listet alle verfügbaren Formate auf	ffmpeg -formats

Wichtige Audio-Optionen

Option	Beschreibung	Beispiel
-acodec	Der Audio-Codec mit dem die Zieldatei codiert werden soll, z.B. libvorbis, libmp3lame Um den Codec der Quelldatei beizubehalten kann man den speziellen Wert 'copy' verwenden - es findet also keine Transcodierung statt: -acodec copy	ffmpeg -i lala.mp3 -acodec libvorbis lala.ogg
-ab	Die Bitrate mit der die Zieldatei kodiert wird. Eine geringere Bitrate veringert die Dateigröße aber auch die Qualität. Es macht keine Sinn eine höhere Bitrate für die Zieldatei zu definieren als die Quelldatei hat.	ffmpeg -i zzz.mp3 -ab 64k zzz2.mp3
-aq	Die Audioqualität; für codecs mit variabler Bitrate
-ar	Die Sampling Frequency in Hertz. Es macht keine Sinn eine höhere Frequenz für die Zieldatei zu definieren als die Quelldatei hat.	ffmpeg -i zzz.mp3 -ar 22050 -ab 96k zzz2.mp3
-ss	When used as an input option (before -i), seeks in this input file to position. When used as an output option (before an output filename), decodes but discards input until the timestamps reach position. This is slower, but more accurate. Position may be either in seconds or in hh:mm:ss[.xxx] form.	`ffmpeg -ss 00:00:30.00 -t 25 -i bar.mp3 -acodec copy bar-new.mp3`
-t	Stop writing the output after its duration reaches duration. duration may be a number in seconds, or in hh:mm:ss[.xxx] form.
-ac	Set the number of audio channels. For output streams it is set by default to the number of input audio channels.	ffmpeg -i zzz.mp3 -ac 1 zzz2.mp3

Wichtige Video-Optionen

Option	Beschreibung	Beispiel
-b	Bitrate	-b 2000k
-vcodec	Der Video-Codec	-vcodec mpeg4 -vcodec copy
-s	Bildgröße	-s 320x240 -s xga
-aspect		-aspect 4:3
-target	Vordefinierte targets (All the format options (bitrate, codecs, buffer sizes) are then set automatically)	-target ntsc-dvd
-r	Frame rate	-r 10
-f	Container format	-f avi
-ss	When used as an input option (before -i), seeks in this input file to position. When used as an output option (before an output filename), decodes but discards input until the timestamps reach position. This is slower, but more accurate. Position may be either in seconds or in hh:mm:ss[.xxx] form.	Extract image: Einen Frame bei Sekunde fünf über eine Sekunde (bei einer Famerate von einem Frame pro Sekunde) mit einer Größe von 320x240 extrahieren -r 1 -t 1 -ss 5 -s 320x240
-t	Stop writing the output after its duration reaches duration. duration may be a number in seconds, or in hh:mm:ss[.xxx] form.

A FFmpeg Tutorial For Beginners
Using ffmpeg to manipulate audio and video files
FFmpeg – the swiss army knife of Internet Streaming

Tuesday, 8 July 2014

Remove a single entry from a MyBatis cache programatically

MyBatis automatically flushes its cache on a insert/update/delete statement. But what if you need to flush a item from the cache because its database representation has been changed by a different application? Here is how to remove a single entry from a MyBatis cache programatically:


public Object removeCacheEntry(SqlSession sqlSession, String cacheId, String mappingName, Object parameterObject) {
    Object removedObject = null;
    Cache cache = getCache(sqlSession, cacheId);
    if (cache != null) {
        CacheKey cacheKey = getCacheKey(sqlSession, mappingName, parameterObject);
        if (cacheKey != null) {
            removedObject = cache.removeObject(cacheKey);
            logger.info("Remove from cache: {} by key {}", removedObject, cacheKey);
        }
    }
    return removedObject;
}

private Cache getCache(SqlSession sqlSession, String cacheId) {
    return sqlSession.getConfiguration().getCache(cacheId);
}

private CacheKey getCacheKey(SqlSession sqlSession, String mappingName, Object parameterObject) {
    Configuration configuration = sqlSession.getConfiguration();
    SimpleExecutor executor = new SimpleExecutor(configuration, null);
    MappedStatement mappedStatement = configuration.getMappedStatement(mappingName);
    BoundSql boundSql = mappedStatement.getBoundSql(parameterObject);
    return executor.createCacheKey(mappedStatement, parameterObject, RowBounds.DEFAULT, boundSql);
}

For example:


removeCacheEntry(qlSession, "com.test.MyEntityMapper", "com.test.MyEntityMapper.getById", 1);