3

I am using PostgreSQL 9.5 64bit version on windows server. The character encoding of the database is set to UTF8.

I'd like to create a function that manipulates multibyte strings. (e.g. cleansing, replace etc.)

I copied C language logic for manipulating characters from a other system, The logic assumes that the character code is sjis.

I do not want to change C language logic, so I want to convert from UTF8 to sjis in C language function of Postgresql. Like the convert_to function. (However, since the convert_to function returns bytea type, I want to obtain it with TEXT type.)

Please tell me how to convert from UTF 8 to sjis in C language.

Create Function Script:

CREATE FUNCTION CLEANSING_STRING(character varying)
RETURNS character varying AS
'$libdir/MyFunc/CLEANSING_STRING.dll', 'CLEANSING_STRING'
LANGUAGE c VOLATILE STRICT;

C Source:

#include <stdio.h>
#include <string.h>
#include <postgres.h>
#include <port.h>
#include <fmgr.h>
#include <stdlib.h>
#include <builtins.h>

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

extern PGDLLEXPORT Datum CLEANSING_STRING(PG_FUNCTION_ARGS);

PG_FUNCTION_INFO_V1(CLEANSING_STRING);
Datum CLEANSING_STRING(PG_FUNCTION_ARGS)
{

    // Get Arg
    text *arg1 = (text *)PG_GETARG_TEXT_P(0);

    // Text to Char[]
    char *arg;
    arg = text_to_cstring(arg1);

    // UTF8 to Sjis
    //Char *sjisChar[] = foo(arg);  // something like that..

    // Copied from other system.(Assumes that the character code is sjis.)
    cleansingString(sjisChar);
    replaceStrimg(sjisChar);

    // Sjis to UTF8
    //arg = bar(sjisChar);  // something like that..

    //Char[] to Text and Return
    PG_RETURN_TEXT_P(cstring_to_text(arg));
}
5
  • 1
    See the functions any_to_server and server_to_any in src/backend/utils/mb/mbutils.c, and the comments at the top of mbutils.c Commented Nov 16, 2017 at 4:58
  • Thank you for your reply. I do not understand how to specify the third argument encoding. Is there a list of encoding somewhere? Is the usage correct? Char sjisChar[] = server_to_any(arg, strlen(arg), / sjis encoding number */ ); Commented Nov 16, 2017 at 5:50
  • @CraigRinger I forgot to give a mention. Commented Nov 16, 2017 at 6:15
  • sorry, pg_server_to_any and pg_any_to_server. And for the encoding name, see the pg_enc2name_tbl in src/backend/utils/mb/encnames.c and the pg_char_to_encoding function Commented Nov 16, 2017 at 6:23
  • @CraigRinger A program as expected was made! Thank you very much. The completed source will be described later as an answer. Commented Nov 16, 2017 at 13:02

1 Answer 1

1

Succeeded in the way I was taught by question comments.

#include <mb/pg_wchar.h> //Add to include.

...

Datum CLEANSING_STRING(PG_FUNCTION_ARGS)
{

    // Get Arg
    text *arg1 = (text *)PG_GETARG_TEXT_P(0);

    // Text to Char[]
    char *arg;
    arg = text_to_cstring(arg1);

    // UTF8 to Sjis
    Char *sjisChar[] = pg_server_to_any(arg, strlen(arg), PG_SJIS);

    // Copied from other system.(Assumes that the character code is sjis.)
    cleansingString(sjisChar);
    replaceStrimg(sjisChar);

    // Sjis to UTF8
    arg =  pg_any_to_server(sjisChar, strlen(sjisChar), PG_SJIS); //It converts from SJIS to server (UTF 8), the third argument sets the encoding of the conversion source.

    //Char[] to Text and Return
    PG_RETURN_TEXT_P(cstring_to_text(arg));
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.