Class ArabicLigaturizer

  • All Implemented Interfaces:
    LanguageProcessor

    public class ArabicLigaturizer
    extends java.lang.Object
    implements LanguageProcessor
    Shape arabic characters. This code was inspired by an LGPL'ed C library: Pango ( see http://www.pango.com/ ). Note that the code of this class is the original work of Paulo Soares.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static char ALEF  
      private static char ALEFHAMZA  
      private static char ALEFHAMZABELOW  
      private static char ALEFMADDA  
      private static char ALEFMAKSURA  
      static int ar_composedtashkeel  
      static int ar_lig  
      static int ar_nothing  
      static int ar_novowel  
      private static char[][] chartable  
      private static char DAMMA  
      static int DIGIT_TYPE_AN
      Digit type option: Use Arabic-Indic digits (U+0660...U+0669).
      static int DIGIT_TYPE_AN_EXTENDED
      Digit type option: Use Eastern (Extended) Arabic-Indic digits (U+06f0...U+06f9).
      static int DIGIT_TYPE_MASK
      Bit mask for digit type options.
      static int DIGITS_AN2EN
      Digit shaping option: Replace Arabic-Indic digits by European digits (U+0030...U+0039).
      static int DIGITS_EN2AN
      Digit shaping option: Replace European digits (U+0030...U+0039) by Arabic-Indic digits.
      static int DIGITS_EN2AN_INIT_AL
      Digit shaping option: Replace European digits (U+0030...U+0039) by Arabic-Indic digits if the most recent strongly directional character is an Arabic letter (its Bidi direction value is RIGHT_TO_LEFT_ARABIC).
      static int DIGITS_EN2AN_INIT_LR
      Digit shaping option: Replace European digits (U+0030...U+0039) by Arabic-Indic digits if the most recent strongly directional character is an Arabic letter (its Bidi direction value is RIGHT_TO_LEFT_ARABIC).
      static int DIGITS_MASK
      Bit mask for digit shaping options.
      private static int DIGITS_RESERVED
      Not a valid option value.
      private static char FARSIYEH  
      private static char FATHA  
      private static char HAMZA  
      private static char HAMZAABOVE  
      private static char HAMZABELOW  
      private static char KASRA  
      private static char LAM  
      private static char LAM_ALEF  
      private static char LAM_ALEFHAMZA  
      private static char LAM_ALEFHAMZABELOW  
      private static char LAM_ALEFMADDA  
      private static char MADDA  
      private static java.util.HashMap<java.lang.Character,​char[]> maptable  
      protected int options  
      private static java.util.HashMap<java.lang.Character,​java.lang.Character> reverseLigatureMapTable
      Some fonts do not implement ligaturized variations on Arabic characters e.g.
      protected int runDirection  
      private static char SHADDA  
      private static char TATWEEL  
      private static char WAW  
      private static char WAWHAMZA  
      private static char YEH  
      private static char YEHHAMZA  
      private static char ZWJ  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      static int arabic_shape​(char[] src, int srcoffset, int srclength, char[] dest, int destoffset, int destlength, int level)  
      (package private) static char charshape​(char s, int which)  
      (package private) static boolean connects_to_left​(ArabicLigaturizer.charstruct a)  
      (package private) static void copycstostring​(java.lang.StringBuffer string, ArabicLigaturizer.charstruct s, int level)  
      (package private) static void doublelig​(java.lang.StringBuffer string, int level)  
      static java.lang.Character getReverseMapping​(char c)  
      boolean isRTL()
      Arabic is written from right to left.
      (package private) static boolean isVowel​(char s)  
      (package private) static int ligature​(char newchar, ArabicLigaturizer.charstruct oldchar)  
      java.lang.String process​(java.lang.String s)
      Processes a String
      static void processNumbers​(char[] text, int offset, int length, int options)  
      (package private) static void shape​(char[] text, java.lang.StringBuffer string, int level)  
      (package private) static int shapecount​(char s)  
      (package private) static void shapeToArabicDigitsWithContext​(char[] dest, int start, int length, char digitBase, boolean lastStrongWasAL)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • maptable

        private static final java.util.HashMap<java.lang.Character,​char[]> maptable
      • reverseLigatureMapTable

        private static final java.util.HashMap<java.lang.Character,​java.lang.Character> reverseLigatureMapTable
        Some fonts do not implement ligaturized variations on Arabic characters e.g. Simplified Arabic has got code point 0xFEED but not 0xFEEE
      • chartable

        private static final char[][] chartable
      • DIGITS_EN2AN

        public static final int DIGITS_EN2AN
        Digit shaping option: Replace European digits (U+0030...U+0039) by Arabic-Indic digits.
        See Also:
        Constant Field Values
      • DIGITS_AN2EN

        public static final int DIGITS_AN2EN
        Digit shaping option: Replace Arabic-Indic digits by European digits (U+0030...U+0039).
        See Also:
        Constant Field Values
      • DIGITS_EN2AN_INIT_LR

        public static final int DIGITS_EN2AN_INIT_LR
        Digit shaping option: Replace European digits (U+0030...U+0039) by Arabic-Indic digits if the most recent strongly directional character is an Arabic letter (its Bidi direction value is RIGHT_TO_LEFT_ARABIC). The initial state at the start of the text is assumed to be not an Arabic, letter, so European digits at the start of the text will not change. Compare to DIGITS_ALEN2AN_INIT_AL.
        See Also:
        Constant Field Values
      • DIGITS_EN2AN_INIT_AL

        public static final int DIGITS_EN2AN_INIT_AL
        Digit shaping option: Replace European digits (U+0030...U+0039) by Arabic-Indic digits if the most recent strongly directional character is an Arabic letter (its Bidi direction value is RIGHT_TO_LEFT_ARABIC). The initial state at the start of the text is assumed to be an Arabic, letter, so European digits at the start of the text will change. Compare to DIGITS_ALEN2AN_INT_LR.
        See Also:
        Constant Field Values
      • DIGITS_RESERVED

        private static final int DIGITS_RESERVED
        Not a valid option value.
        See Also:
        Constant Field Values
      • DIGITS_MASK

        public static final int DIGITS_MASK
        Bit mask for digit shaping options.
        See Also:
        Constant Field Values
      • DIGIT_TYPE_AN

        public static final int DIGIT_TYPE_AN
        Digit type option: Use Arabic-Indic digits (U+0660...U+0669).
        See Also:
        Constant Field Values
      • DIGIT_TYPE_AN_EXTENDED

        public static final int DIGIT_TYPE_AN_EXTENDED
        Digit type option: Use Eastern (Extended) Arabic-Indic digits (U+06f0...U+06f9).
        See Also:
        Constant Field Values
      • DIGIT_TYPE_MASK

        public static final int DIGIT_TYPE_MASK
        Bit mask for digit type options.
        See Also:
        Constant Field Values
      • options

        protected int options
      • runDirection

        protected int runDirection
    • Constructor Detail

      • ArabicLigaturizer

        public ArabicLigaturizer()
      • ArabicLigaturizer

        public ArabicLigaturizer​(int runDirection,
                                 int options)
    • Method Detail

      • isVowel

        static boolean isVowel​(char s)
      • charshape

        static char charshape​(char s,
                              int which)
      • shapecount

        static int shapecount​(char s)
      • doublelig

        static void doublelig​(java.lang.StringBuffer string,
                              int level)
      • shape

        static void shape​(char[] text,
                          java.lang.StringBuffer string,
                          int level)
      • arabic_shape

        public static int arabic_shape​(char[] src,
                                       int srcoffset,
                                       int srclength,
                                       char[] dest,
                                       int destoffset,
                                       int destlength,
                                       int level)
      • processNumbers

        public static void processNumbers​(char[] text,
                                          int offset,
                                          int length,
                                          int options)
      • shapeToArabicDigitsWithContext

        static void shapeToArabicDigitsWithContext​(char[] dest,
                                                   int start,
                                                   int length,
                                                   char digitBase,
                                                   boolean lastStrongWasAL)
      • getReverseMapping

        public static java.lang.Character getReverseMapping​(char c)
      • process

        public java.lang.String process​(java.lang.String s)
        Description copied from interface: LanguageProcessor
        Processes a String
        Specified by:
        process in interface LanguageProcessor
        Parameters:
        s - the original String
        Returns:
        the processed String