Java判断是汉字或标点符号
HanLP 中并没有直接的函数来判断给定字符是否是汉字或标点符号。但你可以使用Java的内置方法,结合Unicode编码范围,来实现这个功能。
以下是一种使用 Java 来判断字符是否为汉字或标点符号的方法:
public class Main {
public static void main(String[] args) {
System.out.println(isChinese('汉')); // 输出:true
System.out.println(isPunctuation(',')); // 输出:true
}
// 判断是否为汉字
public static boolean isChinese(char c) {
Character.UnicodeBlock ub = Character.UnicodeBlock.of(c);
return (ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS
|| ub == Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS
|| ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A
|| ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B
|| ub == Character.UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION
|| ub == Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS
|| ub == Character.UnicodeBlock.GENERAL_PUNCTUATION);
}
// 判断是否为标点符号
public static boolean isPunctuation(char c) {
int codePoint = Character.getType(c);
return codePoint == Character.OTHER_PUNCTUATION || codePoint == Character.INITIAL_QUOTE_PUNCTUATION
|| codePoint == Character.FINAL_QUOTE_PUNCTUATION;
}
}
在上述代码中,isChinese(char)
方法用于判断一个字符是否是汉字,而 isPunctuation(char)
方法则用于判断一个字符是否是标点符号。