Tag Archives: UTF

Waht is wrong with: echo $PATH | sed s/:/\n/g

July 16, 2013 Bartle Doo Leave a comment

Hello all!

I am on Mac (10.8.4) and my shell tcsh (man says version: Astron 6.17.00). Just to precise my tcsh:

Code:

echo $LC_CTYPE

UTF-8

I want to replace all ‘:’ with a new line, to get all paths on one line. I don’t find a way to make my shell accept the “n”

My start was:

Code:

echo $PATH | sed s/:/n/g

And I tried every possible variation of n: \n – “n” – “\n” or with r. The result is always the : colon disappears but the replace is sometimes nothing or a literal “n” …

Sorry for this dumb question; but the forum is for Dummies, isn’t it?

marek

ps: in bash I get the same results

Linux

Script to create unique look-up for headers for a Dictionary

April 19, 2013 Bartle Doo Leave a comment

By gimley

I have a text file in UTF-8 format which has the following data structure

Code:

HEADWORD=gloss1,gloss2,gloss3 etc

I want to convert it so that all the glosses of the HeadWord appear on separate lines

Code:

HEADWORD=gloss1

HEADWORD=gloss2

HEADWORD=gloss3

An example will illustrate the requirement
INPUT

Code:

હોશમાં આવવું=regain consciousness.

હોશિયાર=clever, intelligent; skilful; alert, vigilant; cautious; understanding, sensible.

હોશિયારી કરવી=boast,(try to) be clever.

હોશિયારી દાખવવી=boast,(try to) be clever.

હોશિયારી બતાવવી=boast,(try to) be clever.

હોશિયારી મારવી=boast,(try to) be clever.

હોશિયારી રાખવી=be cautious,be vigilant,be alert.

હોશિયારી=cleverness, vigilance

હોહા=noise, uproar, tumult, public talk or discussion, excitement, agitation, alarm, consternation.

હોહાકાર=uproar, tumult, excitement, alarm.

હોહો=noise, uproar, tumult, public talk or discussion, excitement, agitation, alarm, consternation.

The Output would be

Code:

હોશિયાર=clever

હોશિયાર=intelligent

હોશિયાર=skilful

હોશિયાર=alert

હોશિયાર=vigilant

હોશિયાર=cautious

હોશિયાર=understanding

હોશિયાર=sensible.

હોશિયારી કરવી=boast

હોશિયારી કરવી=(try to) be clever.

હોશિયારી દાખવવી=boast

હોશિયારી દાખવવી=(try to) be clever.

હોશિયારી બતાવવી=boast

હોશિયારી બતાવવી=(try to) be clever.

હોશિયારી મારવી=boast

હોશિયારી મારવી=(try to) be clever.

હોશિયારી રાખવી=be cautious

હોશિયારી રાખવી=vigilant or alert.

હોશિયારી=cleverness

હોશિયારી=vigilance

હોશિયારી=etc.

હોહા=noise

હોહા=uproar

હોહા=tumult

હોહા=public talk or discussion

હોહા=excitement

હોહા=agitation

હોહા=alarm

હોહા=consternation.

હોહાકાર=uproar

હોહાકાર=tumult

હોહાકાર=excitement

હોહાકાર=alarm

હોહો=noise

હોહો=uproar

હોહો=tumult

હોહો=public talk or discussion

હોહો=excitement

હોહો=agitation

હોહો=alarm

હોહો=consternation

At present I use macros which identify the delimiter, copy the text between two delimiters, paste it on next line, preface it with the headword and continue the operation till end of line and repeat the same for the next line. Since the file is huge a PERL or AWK script would help.
I work under Windows and UNIX type solutions do not work for me unfortunately.
Many thanks in advance.

From: http://www.unix.com/shell-programming-scripting/221547-script-create-unique-look-up-headers-dictionary.html

Linux

How to fix my IMDB Script

April 17, 2013 Bartle Doo Leave a comment

By lg2013

Hello,

I am using IMDB bot file for my movie site..

But I get this message like this—Parse error: syntax error, unexpected end of file, expecting variable (T_VARIABLE) or ${ (T_DOLLAR_OPEN_CURLY_BRACES) or {$ (T_CURLY_OPEN) in /home/u653954634/public_html/wp-content/plugins/WB-IMDB/bot.php on line 246

Can anyone fix this in my script..

Here the script

Code:

function save_image( $inPath, $outPath )

{

    $in = fopen( $inPath, "rb" );

    $out = fopen( $outPath, "wb" );

    while ( $chunk = fread( $in, 8192 ) )

    {

        fwrite( $out, $chunk, 8192 );

    }

    fclose( $in );

    fclose( $out );

}
function get_seo_link( $str, $replace = array( ), $delimiter = "-" )

{

    setlocale( LC_ALL, "en_US.UTF8" );

    if ( !empty($replace) )

    {

        $str = str_replace( ( array )$replace, " ", $str );

    }

    else

    {

        $trArr = array( "ç", "Ç", "ı", "İ", "ş", "Ş", "ğ", "Ğ", "ö", "Ö", "ü", "Ü" );

        $toArr = array( "c", "c", "i", "i", "s", "s", "g", "g", "o", "o", "u", "u" );

        $str = str_replace( $trArr, $toArr, $str );

    }

    $clean = iconv( "UTF-8", "ASCII//TRANSLIT", $str );

    $clean = preg_replace( "/[^a-zA-Z0-9\/_|+ -]/", "", $clean );

    $clean = trim( $clean );

    $clean = preg_replace( "/[\/_|+ -]+/", $delimiter, $clean );

    $clean = mb_strtolower( $clean, "UTF-8" );

    return $clean;

}
function bot_starten( )

{

    global $rootp;

    global $burl;

    global $wpdb;

    include( "slugs.php" );

    echo "
Full Movie Archives IMDB Bot

IMDB ID: | t
";

    $theme_name = get_template_directory( );

    $l = "Full Movie";

    $f = $theme_name."/footer.php";

    $fd = fopen( $f, "r" );

    $c = fread( $fd, filesize( $f ) );

    fclose( $fd );

    if ( !preg_match( "#".$l."#", $c ) )

    {

        echo "
This plugin is only available at www.doynet.com, Full Movie Archives Theme, works with. Or the footer.php file, <a href="http://www.westernjournalism.com/us-military-deaths-in-afghanistan-at-2070/" rel="dofollow" title="View all articles about OK here">Full Movie</a>  code to  
From: http://www.unix.com/shell-programming-scripting/221391-how-fix-my-imdb-script.html

Linux

Nepoogle 1.0.0 (KDE Other Utility)

March 29, 2013 Bartle Doo Leave a comment

Nepoogle 1.0.0
(KDE Other Utility)
WARNING!!! Nepoogle needs at least KDE 4.10.1 and Nepomuk2 python bindings to work. This bindings are available here: http://quickgit.kde.org/?p=scratch%2Fsedwards%2Fpykde4nepomuk2.git

Nepoogle is a search utility for Nepomuk inspired in Google’s search syntax. It has it’s own SPARQL query builder engine to achieve more control and be more safely about encoding bugs.

An install and uninstall scripts are available to install all required files.

Functionalities:
-Full support to UTF-8.
-Many shorcuts and commands, read help for full list and description.
-“and”, “or” and several comparison operators supported.
-You can open urls in results with default external applications.
-Resource viewer with navigation system, image support and multimedia player (video player seems to work only with GStreamer Phonon Backend with video support).
-Supports Nepomuk::Query::QueryParser() query engine.
-A cache system with manual refresh usign F5 key.
-Three different query engines.
-Playlist for audio and video files.
-Supports Nepomuk TVNamer, http://kde-apps.org/content/show.php/Nepomuk+TVNamer?content=148910.

Known bugs that can not be fixed:
-Volume is reset to max volume on track change: this is a bug in QtWebKit, current javascript code it’s working in Chrome and Firefox. Any help to find a workaround would be appreciated.
-Some mp3 files sounds awful: this is a bug in GStreamer.

Search tips:
-Ask specific questions and if you get no results then try more general questions:
“e” — you will get a lot of results and a slow query
actor:e — more optimized version if you are looking for an actor

-If you know exactly the word your are looking for “+” will improve performance and your results, but noticed search is case sensitive:
performer:+”Fleetwood Mac” — query works
performer:+”fleetwood mac” — query fails
performer:”fleetwood mac” — query works but is more slow

-Regular expressions are your friends:
^word — string begins with “word”
word$ — string ends with “word”
^this.*word$ — string begins with “this” and ends with “word”

-Use quotation marks if you need to search for special characters or spaces:
“Singin’ in the Rain” — because character “‘” and spaces
“^file:///storage/movies” — because character “:”
“f(x)” — because parenthesis
“Peter Gabriel” — without quotation marks nepoogle will search for “Peter and Gabriel”

-With e0 prefix Nepomuk::Query::QueryParser() engine is used rather than internal engine:
“e0 movies” — display results using Nepomuk API.

Future development:
-Query engine improvement with parenthesis support.
-External configuration file.
-Remote queries.

changelog:
v1.0.0 — 2013/03/29
-BUG: …read more
Source: FULL ARTICLE at KDE Apps

Linux

URL extraction from JSON file

March 6, 2013 Bartle Doo Leave a comment

By busyboy

I’m trying to export URLs from within a JSON file which in turn resulted from export of Mozilla-Firefox bookmarks. Its single line file with below given values from awk

Code:

$ awk 'END { print NR }' bookmarks.json

1

$ awk 'END { print NF }' bookmarks.json

2706

$ awk -F, 'END { print NF }' bookmarks.json

4754

using sed, it gives me only 1st occurrence and rest is missed.

Code:

$ sed  's/.*"(http:.*)"/1/' bookmarks.json

http://www.oracle.com/us/products/servers-storage/servers/blades/index.html","charset":"UTF-8}]}]}

$

extract from json file is something like below:

Code:

{"title":"","id":1,"dateAdded":1331548812311000,"lastModified":1331549028262000,"type":"text/x-moz-place-container","root":"placesRoot","children":[{"title":

"Bookmarks Menu","id":2,"parent":1,"dateAdded":1331548812311000,"lastModified":1342096853234000,"type":"text/x-moz-place-container","root":"bookmarksMenuFold

er","children":[{"title":"Recent Tags","id":925,"parent":2,"annos":[{"name":"Places/SmartBookmark","flags":0,"expires":4,"mimeType":null,"type":3,"value":"Re

centTags"}],"type":"text/x-moz-place","uri":"place:sort=14&type=6&maxResults=10&queryType=1"},{"index":1,"title":"Recently Bookmarked","id":924,"parent":2,"a

nnos":[{"name":"Places/SmartBookmark","flags":0,"expires":4,"mimeType":null,"type":3,"value":"RecentlyBookmarked"}],"type":"text/x-moz-place","uri":"place:fo

lder=BOOKMARKS_MENU&folder=UNFILED_BOOKMARKS&folder=TOOLBAR&sort=12&excludeQueries=1&maxResults=10&queryType=1"},{"index":2,"title":"","id":26,"parent":2,"da

teAdded":1243009025055489,"lastModified":1331549044829000,"annos":[{"name":"placesInternal/GUID","flags":0,"expires":4,"mimeType":null,"type":3,"value":"{445

36f3f-1d99-4e6d-8b77-d5e89c334d2d}2"}],"type":"text/x-moz-place-separator"},{"index":3,"title":"Get Bookmark Add-ons","id":27,"parent":2,"dateAdded":12430090

25055489,"lastModified":1331549044829000,"annos":[{"name":"placesInternal/GUID","flags":0,"expires":4,"mimeType":null,"type":3,"value":"{44536f3f-1d99-4e6d-8

b77-d5e89c334d2d}3"}],"type":"text/x-moz-place","uri":"https://en-us.add-ons.mozilla.com/en-US/firefox/bookmarks/"},{"index":4,"title":"","id":28,"parent":2,

"dateAdded":1243009025055489,"lastModified":1331549044829000,"annos":[{"name":"placesInternal/GUID","flags":0,"expires":4,"mimeType":null,"type":3,"value":"{

44536f3f-1d99-4e6d-8b77-d5e89c334d2d}4"}],"type":"text/x-moz-place-separator"},{"index":5,"title":"Mozilla Firefox","id":29,"parent":2,"dateAdded":1243009025

055489,"lastModified":1331549044845000,"annos":[{"name":"placesInternal/GUID","flags":0,"expires":4,"mimeType":null,"type":3,"value":"{44536f3f-1d99-4e6d-8b7

7-d5e89c334d2d}5"}],"type":"text/x-moz-place-container","children":[{"title":"Help and Tutorials","id":30,"parent":29,"dateAdded":1243009025055489,"lastModif

ied":1331549044845000,"annos":[{"name":"placesInternal/GUID","flags":0,"expires":4,"mimeType":null,"type":3,"value":"{44536f3f-1d99-4e6d-8b77-d5e89c334d2d}6"

}],"type":"text/x-moz-place","uri":"http://en-us.www.mozilla.com/en-US/firefox/help/"},{"index":1,"title":"Customize Firefox","id":31,"parent":29,"dateAdded"

:1243009025055489,"lastModified":1331549044845000,"annos":[{"name":"placesInternal/GUID","flags":0,"expires":4,"mimeType":null,"type":3,"value":"{44536f3f-1d

99-4e6d-8b77-d5e89c334d2d}7"}],"type":"text/x-moz-place","uri":"http://en-us.www.mozilla.com/en-US/firefox/customize/"},

Regards,

Linux

Project – ERP-MES Interface

March 2, 2013 Bartle Doo Leave a comment

By Noobie1995

Hi all,

that’s my personal “war story” and probably a beginning of a never ending love story.

It is roughly 8 weeks ago, I’ve got a training for a new ERP system. In front of the training my boss mentioned it based on a unix. Great, I have been always interested in but I don’t know why I haven’t invest time in it. Anyway, the training was great and it has awaken my passion.

Now, I try to invest every second to get more knowledge and especially the shell scripting and programming brings my heart to beat. That’s the love story

Back to my “war story”. I am responsible for testing the interface between an ERP and MES System. The problem it takes ages to get test data in order to verify it. Supplier pushed out deliveries for months. Honestly, supplier has a lot of more important stuff to do in order to get the ERP System running but I HAVE TO TEST THE INTERFACE. Hence, I’ve decided to write a shell script in order to simulate the interface communication and begin testing. My intention is to post the result and share result in this forum. (probably other rookies like me will get benefit)

ERP-MES Interface:
Target of the project is to pass string, numeric and decimal values into a function. There the types of values will be processed differently. The output will re-set the variable with a new value. At the end all variables will be merged to string and passed into an UTF-8 file “int.dat”. This file should not contain a BOM (Byte Order Mark). That’s it.

I began to write first lines on 28th of February 2013. From my prospective it is a challenge, others will LoL. It is just a very small script but integrates several functions. I will update MY project continously. From time to time, once I am not able to solve a problem lonely, I have to ask. Hereafter you can find questions and answers to my project:

1. Question regarding the result of variable output (quoted/unquoted) –> will follow, I am not allowed to post links, NOT YET!!!

#!/bin/bash fkt_auffuellen() { case $1 in  "C")    if [ ${#2} -eq $3 ]; then     value=$2    elif [ ${#2} -lt $3 ]; then      value="$(printf "%-$3s" $2)"    else     value="$(printf "%.$3s" $2)" fi;;   "N") ;;  "D") ;; esac } vorg_nr_t=C vorg_nr="AUFTRAG005-010010" vorg_nr_l=40 fkt_auffuellen $vorg_nr_t $vorg_nr $vorg_nr_l vorg_nr=$value echo "$vorg_nr"

Diary:
2nd of March 2013 – Let’s go

Linux

Changing txt files to pure UTF-8

February 22, 2013 Bartle Doo Leave a comment

By sovdia

I have two Macs running 10.7.5.

We download .txt files from remote site to these local Macs using ‘rsync -e ssh -avz…’.

The files on Mac1 are in the required format of pure UTF-8. The files on Mac2 are in UTF-8 (no BOM) which is wrong format for us; these formats are indicated using BBEdit.

Is there a Unix instruction to correct this or an instruction we can pass to rysnc?

Linux

AIX & TAR related stuff

February 22, 2013 Bartle Doo Leave a comment

By filosophizer

This thread is about using tar on AIX

1. Find out which version of tar you are using
thanks to bakunin

Code:

>what $(which tar)

/usr/bin/tar:

        61      1.14  src/bos/usr/ccs/lib/libc/__threads_init.c, libcthrd, bos53  0 7/11/00 12:04:14

        10  1.49.1.75  src/bos/usr/bin/tar/tar.c, cmdarch, bos53A, a2004_38A1 8/  31/04 07:07:20

        92      1.7  src/bos/usr/ccs/lib/libIN/eprintf.c, libIN, bos530 6/10/91   10:23:13
>lslpp -L |grep -i tar

  devices.scsi.tm.rte       5.3.0.10    C     F    SCSI Target Mode Software

  devices.ssa.tm.rte         5.3.0.0    C     F    Target Mode SSA Support

  tar                         1.14-2    C     R    A GNU file archiving program.

>tar --version tar: Not a recognized flag: - Usage: tar -{c|r|t|u|x} [ -BdDEFhilmopRUsvw ] [ -Number ] [ -f TarFile ] [ -b Blocks ] [ -S [ Feet ] | [ Feet@Density ] | [ Blocksb ] ] [ -L InputList ] [-X ExcludeFile] [ -N Blocks ] [ -C Directory ] File ... Usage: tar {c|r|t|u|x} [ bBdDEfFhilLXmNopRsSUvw[0-9] ] ] [ Blocks ] [ TarFile ] [ InputList ] [ ExcludeFile ] [ [ Feet ] | [ Feet@Density ] | [ Blocksb ] ] [-C Directory ] File ...

2. GNU Tar

why it is important to know which version of tar you are using ?

Code:

# tar cvf - file | gzip > file.tar.gz

tar: 0511-825 The file 'file' is too large.

AIX tar has a 2 GB limit to the file size. On UNIX tar there are some limits: avg. 12 GB of file archive (may vary depend of UNIX flavor), 6 level of directories (all UNIX, but not GNU tar), 32k directories on one level (this is a limitation of JFS and JFS2) Use tar from the Linux Toolbox CD – can get it alternatively here: https://www14.software.ibm.com/webap…G=tar&cp=UTF-8
or
Welcome to Bull AIX freeware site
ftp://ftp.software.ibm.com/aix/freeS…aix5.1.ppc.rpm

Procedure
Find the native AIX tar location.

Code:

#which tar

/usr/bin/tar

Move the native …read more
Source: FULL ARTICLE at The UNIX and Linux Forums

Linux

ISO 88591 file encoding charset in Linux

February 9, 2013 Bartle Doo Leave a comment

By sonic_air

Hello Experts, please help to provide any insight as I am facing issue migrating java application from hpux to redhat. The java program is using InputStreamReader to read a file without specifying any charset parameter.

However, in new Linux Redhat 5.6 environent, when reading a file that contains Latin char, I have to either
1) set my locale to ISO 88591 or
2) specifying InputStreamReader to read as ISO 88591 or
3) convert the file using iconv from ISO 88591 to UTF-8
to read the file contents correctly.

The problem here is, I need to read files from different encoding, thus option 1 & 2 is out. While for option 3, the file is some kind of binary file, I could not use file command to determine the file encoding before issue iconv. 🙁

To my knowledge Java InputStreamReader will use system’s locale setting if no charset is specified.

Code:

New server: Red Hat Enterprise Linux Server release 5.6 (Tikanga)

(gcc version 4.1.2 20080704 (Red Hat 4.1.2-50))

locale:

LANG=en_US.UTF-8

LC_CTYPE="en_US.UTF-8"

LC_NUMERIC="en_US.UTF-8"

LC_TIME="en_US.UTF-8"

LC_COLLATE="en_US.UTF-8"

LC_MONETARY="en_US.UTF-8"

LC_MESSAGES="en_US.UTF-8"

LC_PAPER="en_US.UTF-8"

LC_NAME="en_US.UTF-8"

LC_ADDRESS="en_US.UTF-8"

LC_TELEPHONE="en_US.UTF-8"

LC_MEASUREMENT="en_US.UTF-8"

LC_IDENTIFICATION="en_US.UTF-8"

LC_ALL=

Code:

Old server: HP-UX czhs0850 B.11.11

locale:

LANG=

LC_CTYPE="C"

LC_COLLATE="C"

LC_MONETARY="C"

LC_NUMERIC="C"

LC_TIME="C"

LC_MESSAGES="C"

LC_ALL=

I am not sure why previous hpux does not have this problem, it seems like the same file is interpreted as UTF8 while it reach hpux server but treated as ISO 88591 in Redhat server. I have tried to change Redhat locale to “C” as well but it’s not working either.

Could it be I need to specify the encoding to UTF8 while mounting the file system? so that all incoming files write to the server will be treated as UTF8?

Linux

Unrecognized Spanish characters from windows to Linux

February 4, 2013 Bartle Doo Leave a comment

By ambious

Background:

I want to upload the file from windows to RHEL5 server, the file stores Spanish words with UTF-8 encoding. it’s used as the data source for loading to database.

some special characters in files like following.
í
ó
Ñ
á

Linux setting:
$ echo $LANG
en_US.UTF-8

I use WinSCP with UTF-8 ON to upload the file to linux server.
I also tried use ssh client to upload.

when I use VI to browse the file, the special characters display unrecognized.

How I can resolve this problem?
Is it a transfer setting problem or system setting problem?

thanks !

Source: FULL ARTICLE at The UNIX and Linux Forums

Linux

Determing the encoding of a file

January 4, 2013 Bartle Doo Leave a comment

By MIA651

Hi, I am trying to determine the encoding for the file, because to convert to UTF-8, it seems as though I have to know the encoding of the source.

Tried this

Code:

file

give me this:
:data or International Language text

Tried to see the locale and this is the output:
LANG=C
LC_COLLATE=”C”
LC_CTYPE=”C”
LC_MONETARY=”C”
LC_NUMERIC=”C”
LC_TIME=”C”
LC_MESSAGES=”C”
LC_ALL=

Not really much help there either. Any help will be appreciated!

Source: FULL ARTICLE at The UNIX and Linux Forums

Linux

Dos2UNIX formatting problem

December 21, 2012 Leave a comment

By abhi1988sriHi,

I was trying to dos2unix a file that has some special characters but dos2unix converted those into different format. I am working on sun server.
I guess the default for dos2unix on sun server is ISO format .
Can i change the format so that it does the conversion in UTF format?
Because I think these characters are not covered in ISO format.

e.g.

Orig : LCA;PZ0;CIRC;1035562;1;Main;0;2;SCL;DEFAUT;0.55Ç min:main

Changed after dos2unix : LCA;PZ0;CIRC;1035561;1;Main;1;2;FSC;DEFAUT;0.55Ç min:main

Orig : LCA;PZ0;CIRC;2070814;1;ILMV0711;1;2;GIL;DEFAUT;Tarif Dédie F&F Monoprix

Changed after dos2unix : LCA;PZ0;CIRC;2070814;1;ILMV0711;1;2;GIL;DEFAUT;Tarif D211die F&F Monoprix

Kindly help..

Regards

Abhinav
Source: The UNIX and Linux Forums

Bartle Doo Article Archives

Tag Archives: UTF

Waht is wrong with: echo $PATH | sed s/:/\n/g

Script to create unique look-up for headers for a Dictionary

How to fix my IMDB Script

Full Movie Archives IMDB Bot

Nepoogle 1.0.0 (KDE Other Utility)

Project – ERP-MES Interface

Changing txt files to pure UTF-8

AIX & TAR related stuff

ISO 88591 file encoding charset in Linux

Unrecognized Spanish characters from windows to Linux

Determing the encoding of a file

Dos2UNIX formatting problem

Articles Archive