为什么从CRON运行PHP脚本会导致字符编码问题?

我有一个
PHP脚本,我从终端运行这里是它的作用:

>从数据库中获取一行数据(表存储要由此脚本专门处理的JSON字符串);
>将JSON字符串转换为数组并准备要插入数据库的数据.
>将所需数据插入数据库

这是脚本:

#!/usr/bin/php
<?PHP
    //script used to parse tweets we have gathered from the twitter streaming API
    mb_internal_encoding("UTF-8");
    date_default_timezone_set('UTC');

    require './config/config.php';
    require './libs/db.class.php';

    require './libs/tweetReadWrite.class.php';
    require './libs/tweetHandle.class.php';
    require './libs/tweetPrepare.class.php';
    require './libs/pushOver.class.php';
    require './libs/getLocationDetails.class.php';

    //instatiate our classes
    $twitdb = new db(Config::getConfig("twitterDbConnStr"),Config::getConfig("twitterDbUser"),Config::getConfig("twitterDbPass"));

    $pushOvr = new PushOver();                                          // push error messages to my phone
    $tweetPR = new TweetPrepare();                                      // prepares tweet data
    $geoData = new getLocationDetails($pushOvr);                        // reverse geolocation using google maps API
    $tweetIO = new TweetReadWrite($twitdb,$tweetPR,$pushOvr,$geoData);  // read and write tweet data to the database

    /* grab cached json row from the ORCALE Database
    *
    * the reason the JSON string is brought back in multiple parts is because
    * PDO doesnt handle CLOB's very well and most of the time the JSON string
    * is larger than 4000 chars - its a hack but it works
    *
    * the following sql specifies a test row to work with which has characters like €$£ etc..
    */
    $sql = "
            SELECT a.tjc_id
                 , dbms_lob.substr(tweet_json, 4000,1) part1
                 , dbms_lob.substr(tweet_json, 8000,4001) part2
                 , dbms_lob.substr(tweet_json, 12000,8001) part3
            FROM twtr_json_cache a
            WHERE a.tjc_id = 8368
            ";

    $sth = $twitdb->prepare($sql);
    $sth->execute();
    $data = $sth->fetchAll();

    //join JSON string back together
    $jsonRaw = $data[0]['PART1'].$data[0]['PART2'].$data[0]['PART3'];

    //shouldnt needs to do this, doesnt affect the outcome anyway
    $jsonRaw = mb_convert_encoding($jsonRaw, "UTF-8"); 

    //convert JSON object to an array
    $data = json_decode($jsonRaw,true);

    //prepares the data (grabs the data I need from the JSON object and does some
    //validation etc then finally submits to the database
    $result = $tweetIO->saveTweet($data); // returns BOOL
    echo $result;
?>

现在,如果我使用./proc_json_cache.php或php proc_json_chache.php从终端运行它,它可以很好地将数据到达数据库UTF-8编码,一切都很好,数据库中的数据看起来像这样的£$@€ ;测试.

如果我通过CRON调用这个脚本,它仍会保存数据,但像€等等特殊字符只是正方形,数据库中的数据看起来像这样 $@

TERM=xterm
SHELL=/bin/bash

这是因为它匹配我当前的shell ENV会话设置,并将以下内容添加到调用我的php脚本的bash脚本中:

export NLS_LANG="ENGLISH_UNITED KINGDOM.AL32UTF8"
export LANG="en_GB.UTF-8"

再次匹配我当前的shell ENV设置,但是当脚本从终端中的cron vs direct运行时,我仍然会遇到字符编码问题.

有没有其他人有类似的问题可以解释如何解决这个问题?
提前致谢.

编辑:

这里有一些关于服务器的更多信息:

操作系统:SUSE Linux Enterprise Server 11
PHP:5.2.14

最佳答案 尝试添加调用php脚本的bash脚本:

unset LANG LANGUAGE LC_CTYPE
export LANG=en_GB.UTF-8 LANGUAGE=en LC_CTYPE=en_GB.UTF-8

见:Re: Crontab’s charset not in utf-8

点赞