Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: weighter->weigher, numbers vs strings, weights with units #8056

Merged
merged 9 commits into from
Feb 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions cgi/product_multilingual.pl
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
use ProductOpener::Mail qw/:all/;
use ProductOpener::Products qw/:all/;
use ProductOpener::Food qw/:all/;
use ProductOpener::Units qw/:all/;
use ProductOpener::Ingredients qw/:all/;
use ProductOpener::Images qw/:all/;
use ProductOpener::URL qw/:all/;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ description: |-
For input, clients can either pass the id of a corresponding taxonomy entry (e.g. "en:pizza-box"), or a free text value prefixed with the language code of the text (e.g. "en:Pizza box", "fr:boite à pizza"). If the language code prefix is missing, the value of the "lc" field of the query will be used.

The resulting structure will contain the id of the canonical entry in the taxonomy if it good be matched, or the free text value prefixed with the language code otherwise.

For weights, the API is expecting a number with the number of grams. If a string is passed instead of a number, we will attempt to convert it to grams. The string may contain units (e.g. "6.9 g"), and use . or , as the decimal separator. Conversion may not work for all inputs. If a string was converted to a number, the API response will include a warning and specify the converted value.
examples:
- number_of_units: 6
shape:
Expand All @@ -34,11 +36,15 @@ properties:
type: string
description: Quantity (weight or volume) of food product contained in the packaging component. (e.g. 75cl for a wine bottle)
weight_specified:
type: number
description: 'Weight (as specified by the manufacturer) of one unit of the empty packaging component (in grams). (e.g. for a 6 pack of 1.5l water bottles, it might be 30, the weight in grams of 1 empty water bottle without its cap which is a different packaging component).'
type:
- number
- string
description: 'Weight (as specified by the manufacturer) of one unit of the empty packaging component (in grams). (e.g. for a 6 pack of 1.5l water bottles, it might be 30, the weight in grams of 1 empty water bottle without its cap which is a different packaging component). If passed a string - possibly with an unit - it will be converted to a number.'
weight_measured:
type: number
description: 'Weight (as measured by one or more users) of one unit of the empty packaging component (in grams). (e.g. for a 6 pack of 1.5l water bottles, it might be 30, the weight in grams of 1 empty water bottle without its cap which is a different packaging component).'
type:
- number
- string
description: 'Weight (as measured by one or more users) of one unit of the empty packaging component (in grams). (e.g. for a 6 pack of 1.5l water bottles, it might be 30, the weight in grams of 1 empty water bottle without its cap which is a different packaging component). If passed a string - possibly with an unit - it will be converted to a number.'
brands:
type: string
description: 'A comma separated list of brands / product names for the packaging component (e.g. "Tetra Pak", Tetra Brik"'
Expand Down
1 change: 1 addition & 0 deletions lib/ProductOpener/Display.pm
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,7 @@ use ProductOpener::Recipes qw(:all);
use ProductOpener::PackagerCodes qw(:all);
use ProductOpener::Export qw(:all);
use ProductOpener::API qw(:all);
use ProductOpener::Units qw/:all/;

use Cache::Memcached::Fast;
use Encode;
Expand Down
286 changes: 1 addition & 285 deletions lib/ProductOpener/Food.pm
Original file line number Diff line number Diff line change
Expand Up @@ -55,17 +55,6 @@ BEGIN {
&normalize_nutriment_value_and_modifier
&assign_nid_modifier_value_and_unit

&unit_to_g
&g_to_unit

&unit_to_kcal

&unit_to_mmoll
&mmoll_to_unit

&normalize_serving_size
&normalize_quantity

&canonicalize_nutriment

&fix_salt_equivalent
Expand Down Expand Up @@ -113,6 +102,7 @@ use ProductOpener::Numbers qw/:all/;
use ProductOpener::Ingredients qw/:all/;
use ProductOpener::Text qw/:all/;
use ProductOpener::FoodGroups qw/:all/;
use ProductOpener::Units qw/:all/;
use ProductOpener::Products qw(&remove_fields);
use ProductOpener::Display qw/single_param/;

Expand Down Expand Up @@ -323,183 +313,6 @@ sub assign_nid_modifier_value_and_unit ($product_ref, $nid, $modifier, $value, $
return;
}

=head2 unit_to_kcal($value, $unit)

Converts <xx><unit> into <xx> kcal.

=cut

sub unit_to_kcal ($value, $unit) {
$unit = lc($unit);

(not defined $value) and return $value;

($unit eq 'kj') and return int($value / 4.184 + 0.5);

# return value without modification if it's already in kcal
return $value + 0; # + 0 to make sure the value is treated as number
}

=head2 unit_to_g($value, $unit)

Converts <xx><unit> into <xx>grams. Eg.:
unit_to_g(2,kg) => returns 2000
unit_to_g(520,mg) => returns 0.52

=cut

# This is a key:value pairs
# The keys are the unit names and the values are the multipliers we can use to convert to a standard unit.
# We can divide by these values to do the reverse ie, Convert from standard to non standard
my %unit_conversion_map = (
# kg = 公斤 - gōngjīn = кг
"\N{U+516C}\N{U+65A4}" => 1000,
# l = 公升 - gōngshēng = л = liter
"\N{U+516C}\N{U+5347}" => 1000,
'kg' => 1000,
'кг' => 1000,
'l' => 1000,
'л' => 1000,
# mg = 毫克 - háokè = мг
"\N{U+6BEB}\N{U+514B}" => 0.001,
'mg' => 0.001,
'мг' => 0.001,
'mcg' => 0.000001,
'µg' => 0.000001,
'oz' => 28.349523125,
'fl oz' => 30,
'dl' => 100,
'дл' => 100,
'cl' => 10,
'кл' => 10,
# 斤 - jīn = 500 Grams
"\N{U+65A4}" => 500,
# Standard units: No conversion units
# Value without modification if it's already grams or 克 (kè) or 公克 (gōngkè) or г
'g' => 1,
'' => 1,
' ' => 1,
'kj' => 1,
'克' => 1,
'公克' => 1,
'г' => 1,
'мл' => 1,
'ml' => 1,
'mmol/l' => 1,
"\N{U+6BEB}\N{U+5347}" => 1,
'% vol' => 1,
'ph' => 1,
'%' => 1,
'% dv' => 1,
'% vol (alcohol)' => 1,
'iu' => 1,
# Division factors for "non standard unit" to mmoll conversions
'mol/l' => 0.001,
'mval/l' => 2,
'ppm' => 100,
"\N{U+00B0}rh" => 40.080,
"\N{U+00B0}fh" => 10.00,
"\N{U+00B0}e" => 7.02,
"\N{U+00B0}dh" => 5.6,
'gpg' => 5.847
);

sub unit_to_g ($value, $unit) {
$unit = lc($unit);

if ($unit =~ /^(fl|fluid)(\.| )*(oz|once|ounce)/) {
$unit = "fl oz";
}

(not defined $value) and return $value;

$value =~ s/,/\./;
$value =~ s/^(<|environ|max|maximum|min|minimum)( )?//;
$value eq '' and return $value;

if (exists($unit_conversion_map{$unit})) {
return $value * $unit_conversion_map{$unit};
}

(($unit eq 'kcal') or ($unit eq 'ккал')) and return int($value * 4.184 + 0.5);

# We return with + 0 to make sure the value is treated as number (needed when outputting json and to store in mongodb as a number)
# lets not assume that we have a valid unit
return;
}

=head2 g_to_unit($value, $unit)

Converts <xx>grams into <xx><unit>. Eg.:
g_to_unit(2000,kg) => returns 2
g_to_unit(0.52,mg) => returns 520

=cut

sub g_to_unit ($value, $unit) {
$unit = lc($unit);

if ((not defined $value) or ($value eq '')) {
return "";
}

$unit eq 'fl. oz' and $unit = 'fl oz';
$unit eq 'fl.oz' and $unit = 'fl oz';

$value =~ s/,/\./;
$value =~ s/^(<|environ|max|maximum|min|minimum)( )?//;

$value eq '' and return $value;

# Divide with the values in the hash
if (exists($unit_conversion_map{$unit})) {
return $value / $unit_conversion_map{$unit};
}

(($unit eq 'kcal') or ($unit eq 'ккал')) and return int($value / 4.184 + 0.5);

# return value without modification if unit is already grams or 克 (kè) or 公克 (gōngkè) or г
return $value + 0;
# + 0 to make sure the value is treated as number
# (needed when outputting json and to store in mongodb as a number)
}

sub unit_to_mmoll ($value, $unit) {
$unit = lc($unit);

if ((not defined $value) or ($value eq '')) {
return '';
}

$value =~ s/,/\./;
$value =~ s/^(<|environ|max|maximum|min|minimum)( )?//;

# Divide with the values in the hash
if (exists($unit_conversion_map{$unit})) {
return $value / $unit_conversion_map{$unit};
}

return $value + 0;
}

sub mmoll_to_unit ($value, $unit) {
$unit = lc($unit);

if ((not defined $value) or ($value eq '')) {
return '';
}

$value =~ s/,/\./;
$value =~ s/^(<|environ|max|maximum|min|minimum)( )?//;

# Multiply with the values in the hash
if (exists($unit_conversion_map{$unit})) {
return $value * $unit_conversion_map{$unit};
}

return $value + 0;
}

# For fat, saturated fat, sugars, salt: http://www.diw.de/sixcms/media.php/73/diw_wr_2010-19.pdf
@nutrient_levels = (['fat', 3, 20], ['saturated-fat', 1.5, 5], ['sugars', 5, 12.5], ['salt', 0.3, 1.5],);

Expand Down Expand Up @@ -926,103 +739,6 @@ sub canonicalize_nutriment ($target_lc, $nutrient) {
return $nid;
}

my $international_units = qr/kg|g|mg|µg|oz|l|dl|cl|ml|(fl(\.?)(\s)?oz)/i;
# Chinese units: a good start is https://en.wikipedia.org/wiki/Chinese_units_of_measurement#Mass
my $chinese_units = qr/
(?:[\N{U+6BEB}\N{U+516C}]?\N{U+514B})| # 毫克 or 公克 or 克 or (克 kè is the Chinese word for gram)
# (公克 gōngkè is for "metric gram")
(?:\N{U+516C}?\N{U+65A4})| # 公斤 or 斤 or (公斤 gōngjīn is a "metric kg")
(?:[\N{U+6BEB}\N{U+516C}]?\N{U+5347})| # 毫升 or 公升 or 升 (升 is liter)
\N{U+5428} # 吨 (ton?)
/ix;
my $russian_units = qr/г|мг|кг|л|дл|кл|мл/i;
my $units = qr/$international_units|$chinese_units|$russian_units/i;

=head2 normalize_quantity($quantity)

Return the size in g or ml for the whole product. Eg.:
normalize_quantity(1 barquette de 40g) returns 40
normalize_quantity(20 tranches 500g) returns 500
normalize_quantity(6x90g) returns 540
normalize_quantity(2kg) returns 2000

=cut

sub normalize_quantity ($quantity) {

my $q = undef;
my $u = undef;

# 12 pots x125 g
# 6 bouteilles de 33 cl
# 6 bricks de 1 l
# 10 unités, 170 g
# 4 bouteilles en verre de 20cl
if ($quantity =~ /(\d+)(\s(\p{Letter}| )+)?(\s)?( de | of |x|\*)(\s)?((\d+)(\.|,)?(\d+)?)(\s)?($units)/i) {
my $m = $1;
$q = lc($7);
$u = $12;
$q = convert_string_to_number($q);
$q = unit_to_g($q * $m, $u);
}
elsif ($quantity =~ /((\d+)(\.|,)?(\d+)?)(\s)?($units)/i) {
$q = lc($1);
$u = $6;
$q = convert_string_to_number($q);
$q = unit_to_g($q, $u);
}

return $q;
}

=head2 normalize_serving_size($serving)

Returns the size in g or ml for the serving. Eg.:
normalize_serving_size(1 barquette de 40g)->returns 40
normalize_serving_size(2.5kg)->returns 2500

=cut

sub normalize_serving_size ($serving) {

# Regex captures any <number>( )?<unit-identifier> group, but leaves allowances for a preceding
# token to allow for patterns like "One bag (32g)", "1 small bottle (180ml)" etc
if ($serving =~ /^(.*[ \(])?(?<quantity>(\d+)(\.|,)?(\d+)?)( )?(?<unit>\w+)\b/i) {
my $q = $+{quantity};
my $u = normalize_unit($+{unit});
$q = convert_string_to_number($q);

return unit_to_g($q, $u);
}

#$log->trace("serving size normalized", { serving => $serving, q => $q, u => $u }) if $log->is_trace();
return 0;
}

# @todo we should have equivalences for more units if we are supporting this
my @unit_equivalences_list = (
['g', qr/gram(s)?/],
['g', qr/gramme(s)?/], # French
);

=head2 normalize_unit ( $unit )

Normalizes units to their standard symbolic forms so that we can support unit names and alternative
representations in our normalization logic.

=cut

sub normalize_unit ($originalUnit) {

foreach my $unit_name (@unit_equivalences_list) {
if ($originalUnit =~ $unit_name->[1]) {
return $unit_name->[0];
}
}

return $originalUnit;
}

=head2 is_beverage_for_nutrition_score( $product_ref )

Determines if a product should be considered as a beverage for Nutri-Score computations,
Expand Down
1 change: 1 addition & 0 deletions lib/ProductOpener/ImportConvert.pm
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ use ProductOpener::Tags qw/:all/;
use ProductOpener::Products qw/:all/;
use ProductOpener::Ingredients qw/:all/;
use ProductOpener::Food qw/:all/;
use ProductOpener::Units qw/:all/;

use CGI qw/:cgi :form escapeHTML/;
use URI::Escape::XS;
Expand Down
18 changes: 18 additions & 0 deletions lib/ProductOpener/Packaging.pm
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ use ProductOpener::Tags qw/:all/;
use ProductOpener::Store qw/:all/;
use ProductOpener::API qw/:all/;
use ProductOpener::Numbers qw/:all/;
use ProductOpener::Units qw/:all/;

=head1 FUNCTIONS

Expand Down Expand Up @@ -422,6 +423,23 @@ sub get_checked_and_taxonomized_packaging_component_data ($tags_lc, $input_packa
$packaging_ref->{$weight} = convert_string_to_number($input_packaging_ref->{$weight});
$has_data = 1;
}
elsif (defined normalize_quantity($input_packaging_ref->{$weight})) {
$packaging_ref->{$weight}
= convert_string_to_number(normalize_quantity($input_packaging_ref->{$weight}));
$has_data = 1;
add_warning(
$response_ref,
{
message => {id => "invalid_type_must_be_number"},
field => {
id => $weight,
value => $input_packaging_ref->{$weight},
valued_converted => $packaging_ref->{$weight}
},
impact => {id => "value_converted"},
}
);
}
else {
add_warning(
$response_ref,
Expand Down
Loading