javascript - Make translation function not translate result again -
i have made simplified version of translation tool similar google translate. idea build simple tool minority language in sweden called "jamska". app built function takes string textarea
id #svenska
, replaces words in string using regexp.
i've made array called arr
that's used in loop of function dictionary. each array item looks this: var arr = [["eldröd", "eillrau"], ["oväder", "over"] ...]
. first word in each array item in swedish, , second word in jamska. if regexp finds matching word in loop replaces word using code:
function translate() { var str = $("#svenska").val(); var newstr = ""; (var = 0; < arr.length; i++) { var replace = arr[i][0]; var replacewith = arr[i][1]; var re = new regexp('(^|[^a-z0-9åäö])' + replace + '([^a-z0-9åäö]|$)', 'ig'); str = str.replace(re, "$1" + replacewith + '$2'); } $("#jamska").val(str); }
the translate()
called in event handler when #svenska
textarea
gets keyup
, this: $("#svenska").keyup(function() { translate(); });
the translated string assigned value of textarea
id #jamska
. far, good.
i have problem though: if translated word in jamska word in swedish, function translates word too. problem occurring because i'm assigning variable str
translated version of same variable, using: str = str.replace(re, "$1" + replacewith + '$2');
. function using same variable on , on again perform translation.
example: swedish word "brydd" "fel" in jamska. "fel" word in swedish, word after translation "felht", since swedish word "fel" "felht" in jamska.
does have idea how work around problem?
instead of looking each jamska word in input , replacing them respective translation, recommend find word ([a-z0-9åäö]+
) in text , replace word either translation if 1 found in dictionary or otherwise:
//var arr = [["eldröd", "eillrau"], ["oväder", "over"] ...] // i'd better use dictionary instead of array define dictionary var dict = { eldröd: "oväder", eillrau: "over" // ... }; var str = "eldröd test eillrau eillrau oväder over"; var translated = str.replace(/[a-z0-9åäö]+/ig, function(m) { var word = m.tolowercase(); var trans = dict[word]; return trans === undefined ? word : trans; }); console.log(translated);
update:
if dictionary keys may represented phrases (i.e. technically appear strings spaces), regex should extended include these phrases explicitly. final regex like
(?:phrase 1|phrase 2|etc...)(?![a-z0-9åäö])|[a-z0-9åäö]+
it try match 1 of phrases explicitly first , single words. (?![a-z0-9åäö])
lookbehind helps filter out phrases followed letters (e.g. varken bättre eller sämreåäö
).
phrases preceded letters implicitly filtered out fact match either fist 1 (and therefore not preceded letter) or it's not first , therefore previous 1 separated current spaces.
//var arr = [["eldröd", "eillrau"], ["oväder", "over"] ...] // i'd better use dictionary instead of array define dictionary var dict = { eldröd: "oväder", eillrau: "over", bättre: "better", "varken bättre eller sämre": "vär å int viller", "test test": "double test" // ... }; var str = "eldröd test eillrau eillrau oväder on test test "; str += "varken bättre eller sämre "; str += "don't trans: varken bättre eller sämreåäö"; str += "don't trans again: åäövarken bättre eller sämre"; var phrases = object.keys(dict) .filter(function(k) { return /\s/.test(k); }) .sort(function(a, b) { return b.length - a.length; }) .join('|'); var re = new regexp('(?:' + phrases + ')(?![a-z0-9åäö])|[a-z0-9åäö]+', 'ig'); var translated = str.replace(re, function(m) { var word = m.tolowercase(); var trans = dict[word]; return trans === undefined ? word : trans; }); console.log(translated);
Comments
Post a Comment